A sampling of recent happenings in the multimodal space. Be sure to expect more this year.
Reward model comes in a static form as an output of state b)
Hi @Nathan - In the RLHF systems diagram above the 3rd stage where the Policy LLM is getting trained, we see the signal of corrected caption passed into reward model - is the reward model also getting updated?
Reward model comes in a static form as an output of state b)
Hi @Nathan - In the RLHF systems diagram above the 3rd stage where the Policy LLM is getting trained, we see the signal of corrected caption passed into reward model - is the reward model also getting updated?