I'm seeing a lot of attempts by startups to create enterprise products out of cutting-edge research like this. But they're not always closely scrutinized b/c they keep their work so close to their chest.
Write-ups like these are incredibly insightful, especially the question section. Really keeps you grounded and shows how convoluted breakthroughs can be.
That's the problem, it's mostly behind closed doors of big companies. Hoping to improve it in the new year (people who expressed interested: AI2, Stanford, Nvidia, and anyone who wants to help)
I'm seeing a lot of attempts by startups to create enterprise products out of cutting-edge research like this. But they're not always closely scrutinized b/c they keep their work so close to their chest.
Write-ups like these are incredibly insightful, especially the question section. Really keeps you grounded and shows how convoluted breakthroughs can be.
Hi Dr. Lambert, are you aware of any papers or research works that empirically demonstrate PPO's superiority over DPO in certain datasets or tasks?
That's the problem, it's mostly behind closed doors of big companies. Hoping to improve it in the new year (people who expressed interested: AI2, Stanford, Nvidia, and anyone who wants to help)