Subscribe
Sign in
Share this post
Interconnects
RLHF roundup: Getting good at PPO, sketching RLHF’s impact, RewardBench retrospective, and a reward model competition
Copy link
Facebook
Email
Notes
More
RLHF roundup: Getting good at PPO, sketching…
Nathan Lambert
Jun 26, 2024
17
Share this post
Interconnects
RLHF roundup: Getting good at PPO, sketching RLHF’s impact, RewardBench retrospective, and a reward model competition
Copy link
Facebook
Email
Notes
More
1
Things to be aware of if you work on language model fine-tuning.
Read →
Comments
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
Share this post
RLHF roundup: Getting good at PPO, sketching…
Share this post
Things to be aware of if you work on language model fine-tuning.