3 Comments
Dec 6, 2023Liked by Nathan Lambert

I'm seeing a lot of attempts by startups to create enterprise products out of cutting-edge research like this. But they're not always closely scrutinized b/c they keep their work so close to their chest.

Write-ups like these are incredibly insightful, especially the question section. Really keeps you grounded and shows how convoluted breakthroughs can be.

Expand full comment

Hi Dr. Lambert, are you aware of any papers or research works that empirically demonstrate PPO's superiority over DPO in certain datasets or tasks?

Expand full comment
author

That's the problem, it's mostly behind closed doors of big companies. Hoping to improve it in the new year (people who expressed interested: AI2, Stanford, Nvidia, and anyone who wants to help)

Expand full comment