For rebranding purposes, I'm curious what’s the difference between a RLAIF and a GAN? In a GAN, a generative model attempts to fool (“red team”) a discriminator into thinking a generated sample came from a dataset of approved samples. In RLAIF, a generative model attempts to fool a discriminator (reward model) into thinking a generated sample came from a dataset of approved samples.
Is the only difference that one uses SGD and the discriminator weights are updated along with the generator?
Hmm, not sure I 100% follow, but maybe? It could be related to what you're saying and the recent paper where Google showed synthetic data helps ImageNet performance. https://arxiv.org/abs/2304.08466
For rebranding purposes, I'm curious what’s the difference between a RLAIF and a GAN? In a GAN, a generative model attempts to fool (“red team”) a discriminator into thinking a generated sample came from a dataset of approved samples. In RLAIF, a generative model attempts to fool a discriminator (reward model) into thinking a generated sample came from a dataset of approved samples.
Is the only difference that one uses SGD and the discriminator weights are updated along with the generator?
Hmm, not sure I 100% follow, but maybe? It could be related to what you're saying and the recent paper where Google showed synthetic data helps ImageNet performance. https://arxiv.org/abs/2304.08466
Can you elaborate?
The same thing occurred to me, Liam. It is a very similar idea.