Fascinating post! It’s really interesting to see RLHF progressing in such a way—it’s one of those ideas that people thought had legs and did very poorly in the earlier parts of the 2010s.
Even if modular, it should be interesting to see the open models try more of these approaches as well. In particular, I really want to get a better sense of the construction of the rewards and policies—and also how folks might start to use simulated data as well.
Fascinating post! It’s really interesting to see RLHF progressing in such a way—it’s one of those ideas that people thought had legs and did very poorly in the earlier parts of the 2010s.
Even if modular, it should be interesting to see the open models try more of these approaches as well. In particular, I really want to get a better sense of the construction of the rewards and policies—and also how folks might start to use simulated data as well.
Agreed. I'm excited to work on it.