Reverse engineering OpenAI’s o1

Sep 16, 2024

What productionizing test-time compute shows us about the future of AI. Exploration has landed in language model training.

Read →

2 Comments

James Wang

Sep 24

Fascinating post! It’s really interesting to see RLHF progressing in such a way—it’s one of those ideas that people thought had legs and did very poorly in the earlier parts of the 2010s.

Even if modular, it should be interesting to see the open models try more of these approaches as well. In particular, I really want to get a better sense of the construction of the rewards and policies—and also how folks might start to use simulated data as well.

Expand full comment

Reply (1)

Nathan Lambert

Sep 24

Agreed. I'm excited to work on it.

Expand full comment