Subscribe
Sign in
Home
Podcast
Navigation
($) Discord
Archive
About
RLHF
Latest
Top
Discussions
Recent reasoning research: GRPO tweaks, base model RL, and data curation
The papers I endorse as worth reading among a cresting wave of reasoning research.
Mar 31
•
Nathan Lambert
71
Share this post
Interconnects
Recent reasoning research: GRPO tweaks, base model RL, and data curation
Copy link
Facebook
Email
Notes
More
2
Interviewing Eugene Vinitsky on self-play for self-driving and what else people do with RL
#13. Reinforcement learning fundamentals and scaling.
Mar 12
•
Nathan Lambert
22
Share this post
Interconnects
Interviewing Eugene Vinitsky on self-play for self-driving and what else people do with RL
Copy link
Facebook
Email
Notes
More
1:09:22
Elicitation, the simplest way to understand post-training
An F1 analogy to help understand fast improvements in post-training on top of slow improvements in scaling.
Mar 10
•
Nathan Lambert
46
Share this post
Interconnects
Elicitation, the simplest way to understand post-training
Copy link
Facebook
Email
Notes
More
Character training: Understanding and crafting a language model's personality
Post-training in industry is very different than the academic papers and open-source models demonstrate. Let's dive into one of my favorite topics in…
Feb 26
•
Nathan Lambert
61
Share this post
Interconnects
Character training: Understanding and crafting a language model's personality
Copy link
Facebook
Email
Notes
More
An unexpected RL Renaissance
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
Feb 13
•
Nathan Lambert
56
Share this post
Interconnects
An unexpected RL Renaissance
Copy link
Facebook
Email
Notes
More
39:48
DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs
Yes, ring the true o1 replication bells for DeepSeek R1 🔔🔔🔔. Where we go next.
Jan 21
•
Nathan Lambert
229
Share this post
Interconnects
DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs
Copy link
Facebook
Email
Notes
More
2
The state of post-training in 2025
Watch now (54 mins) | A re-record of my NeurIPS tutorial on language modeling (plus some added content).
Jan 8
•
Nathan Lambert
44
Share this post
Interconnects
The state of post-training in 2025
Copy link
Facebook
Email
Notes
More
53:50
OpenAI's o3: The grand finale of AI in 2024
A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.
Dec 20, 2024
•
Nathan Lambert
106
Share this post
Interconnects
OpenAI's o3: The grand finale of AI in 2024
Copy link
Facebook
Email
Notes
More
4
OpenAI's Reinforcement Finetuning and RL for the masses
The cherry on Yann LeCun’s cake has finally been realized.
Dec 11, 2024
•
Nathan Lambert
62
Share this post
Interconnects
OpenAI's Reinforcement Finetuning and RL for the masses
Copy link
Facebook
Email
Notes
More
Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning
Listen now | Interconnects interview #11. An overview on the past, present, and future of RL.
Dec 5, 2024
•
Nathan Lambert
and
Finbarr Timbers
14
Share this post
Copy link
Facebook
Email
Notes
More
1:08:32
OpenAI's o1 using "search" was a PSYOP
How to understand OpenAI's o1 models as really just one wacky, wonderful, long chain of thought
Dec 4, 2024
•
Nathan Lambert
60
Share this post
Interconnects
OpenAI's o1 using "search" was a PSYOP
Copy link
Facebook
Email
Notes
More
2
Tülu 3: The next era in open post-training
We give you open-source, frontier-model post-training.
Nov 21, 2024
•
Nathan Lambert
66
Share this post
Interconnects
Tülu 3: The next era in open post-training
Copy link
Facebook
Email
Notes
More
3
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts