3 Comments
User's avatar
K. Liam Smith's avatar

For rebranding purposes, I'm curious what’s the difference between a RLAIF and a GAN? In a GAN, a generative model attempts to fool (“red team”) a discriminator into thinking a generated sample came from a dataset of approved samples. In RLAIF, a generative model attempts to fool a discriminator (reward model) into thinking a generated sample came from a dataset of approved samples.

Is the only difference that one uses SGD and the discriminator weights are updated along with the generator?

Expand full comment
Nathan Lambert's avatar

Hmm, not sure I 100% follow, but maybe? It could be related to what you're saying and the recent paper where Google showed synthetic data helps ImageNet performance. https://arxiv.org/abs/2304.08466

Can you elaborate?

Expand full comment
Robin Allenson's avatar

The same thing occurred to me, Liam. It is a very similar idea.

Expand full comment