Discussion about this post

User's avatar
James Wang's avatar

Reasoning further pushes the ability for these models to generalize. Totally agree there. There’s still nothing here to help the LLM wander outside of its existing distribution/vector space though, right?

The distinction I’m making is one that’s more obvious in an RL agent. One can argue that it’s still highly dependent on your cost/reward shaping, but you can have an agent learn to handle entirely novel situations. In part that often is quite a simplistic “novelty”, depending on case, but it’s just ultimately formulated quite differently.

An LLM, even with CoT, can highly refine its answers into the right part of the vector space. It also has a huge knowledge base, which makes most queries quite good, especially if it can zone in into the right part of it. However, there is no ability to actually go “out of sample” entirely, just based on the conceptual way it operates. Is this understanding correct, or are you saying it generalizes beyond that?

Expand full comment
Nicholas Wagner's avatar

Thanks for this post. I find myself often wondering this exact question of how far reasoners will go in other domains. I was not aware Claude Sonnet 3.5 was interpolating between shorter (no?) chains of thought and longer ones. If that works reliably, it seems like an obvious way to navigate the jagged frontier for maximum performance while controlling costs on non-reasoning required tasks. So obvious that I wonder if there is a flaw I am missing to explain why no one else does this.

I am also always wondering what role government research funding can play on leading edge problems. It seems like there are rich opportunities to develop other verifiers and provide evidence that LLM-as-a-judge works in particular domains.

Expand full comment
5 more comments...

No posts