Discussion about this post

User's avatar
Komal's avatar

I just discovered your blog! This was an incredible read, would love to hear your take on current developments with instruction fine tuning and possibly a comparison between it's capabilities when pitted again RLHF.

Shanvit Shetty's avatar

Great piece, Nathan — really enjoyed reading the full article, especially the clear breakdown on data quality, preference collection difficulties, and the nuances of what RLHF actually achieves.

Do you think applying RL inside an agent harness (e.g. Pi Agent with memory) could meaningfully improve outcomes compared to standard RLHF on chat models? Especially for longer, multi-step tasks.

1 more comment...

No posts

Ready for more?