4 Comments

An honest question on how this works, because there’s a lot of broad speculation, and I’ve gotten confused as to what this stuff is or isn’t. There’s been a lot of loose language and redefinition of previously terminology that has other connotations, if not hard definitions (AGI, what it is or isn’t now… and “reasoning” here).

Some of OpenAI’s rather… convenient… marketing language (understandable given their business priorities) hasn’t helped either.

At the same time as this is a “single forward pass of a model that is trained with RL” but it chooses the most common response with something like a consensus@N method… but it doesn’t have an evaluator model? But results are often replicable with CoT from repeated prompting in a “non reasoning” model?

What is the actual nature of the reasoning here? I can understand the conceptual (whether or not it’s actually implemented this way) “run a bunch of times and also have internal extra prompting to get more consistent and further” idea. That would also make conceptual sense why the inference costs and time scale as they do.

But then if it’s just a forward pass of a plain old language model, are we saying that it generates the tokens in that same way and hides things until the final output to the user? That would also fit the mental, conceptual model and would explain why some of these repeated prompting cases have replicated o1 results.

Or is this a completely wrong understanding of what this is?

Expand full comment

Replying to questions on o1/o3 in line because I get them a lot :)

> Some of OpenAI’s rather… convenient… marketing language (understandable given their business priorities) hasn’t helped either.

Yes. Have heard anonymous tips from OpenAI that specifically the "test-time compute" x-axis on their now famous plot is misleading. It could be a combo of inference compute + RL training. Not clear.

> At the same time as this is a “single forward pass of a model that is trained with RL” but it chooses the most common response with something like a consensus@N method… but it doesn’t have an evaluator model? But results are often replicable with CoT from repeated prompting in a “non reasoning” model?

Consensus@N is what is likely used for "pro" mode and will be an API parameter in the future (an expensive one). The "non reasoning" model part is showing that you can spend *even more* compute on a normal model and often still find the answer.

> What is the actual nature of the reasoning here?

I will comment more on this in 2025, but it is a slow and steady grind through many tokens to progress an answer and check work. It's a long intermediate process.

> are we saying that it generates the tokens in that same way and hides things until the final output to the user?

Yes, definitely.

:D

Expand full comment

Thanks so much! This is helpful for confirming my understanding (in a super direct way, vs all the kind of loose, often hyped descriptions of what it isn’t out there).

Expand full comment

yeah, its very confirming how many openai people gave me a nod of approval when I walked back my first post and admitted I psyop'd myself like most people on twitter. Standing strong with o3 being simple.

Expand full comment