For rebranding purposes, I'm curious what’s the difference between a RLAIF and a GAN? In a GAN, a generative model attempts to fool (“red team”) a discriminator into thinking a generated sample came from a dataset of approved samples. In RLAIF, a generative model attempts to fool a discriminator (reward model) into thinking a generated sample came from a dataset of approved samples.
Is the only difference that one uses SGD and the discriminator weights are updated along with the generator?
For rebranding purposes, I'm curious what’s the difference between a RLAIF and a GAN? In a GAN, a generative model attempts to fool (“red team”) a discriminator into thinking a generated sample came from a dataset of approved samples. In RLAIF, a generative model attempts to fool a discriminator (reward model) into thinking a generated sample came from a dataset of approved samples.
Is the only difference that one uses SGD and the discriminator weights are updated along with the generator?