Subscribe
Sign in
Home
Podcast
Navigation
($) Discord
Archive
About
RLHF
Latest
Top
Discussions
Sycophancy and the art of the model
GPT-4o-simp, LMArena backlash, and people refusing to understand how messy and crucial RLHF is.
May 4
•
Nathan Lambert
55
1
Recent reasoning research: GRPO tweaks, base model RL, and data curation
The papers I endorse as worth reading among a cresting wave of reasoning research.
Mar 31
•
Nathan Lambert
76
2
Interviewing Eugene Vinitsky on self-play for self-driving and what else people do with RL
#13. Reinforcement learning fundamentals and scaling.
Mar 12
•
Nathan Lambert
22
1:09:22
Elicitation, the simplest way to understand post-training
An F1 analogy to help understand fast improvements in post-training on top of slow improvements in scaling.
Mar 10
•
Nathan Lambert
47
Character training: Understanding and crafting a language model's personality
Post-training in industry is very different than the academic papers and open-source models demonstrate. Let's dive into one of my favorite topics in…
Feb 26
•
Nathan Lambert
62
An unexpected RL Renaissance
New talk! Forecasting the Alpaca moment for reasoning models and why the new style of RL training is a far bigger deal than the emergence of RLHF.
Feb 13
•
Nathan Lambert
60
39:48
DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs
Yes, ring the true o1 replication bells for DeepSeek R1 🔔🔔🔔. Where we go next.
Jan 21
•
Nathan Lambert
233
2
The state of post-training in 2025
Watch now (54 mins) | A re-record of my NeurIPS tutorial on language modeling (plus some added content).
Jan 8
•
Nathan Lambert
47
53:50
OpenAI's o3: The grand finale of AI in 2024
A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.
Dec 20, 2024
•
Nathan Lambert
106
4
OpenAI's Reinforcement Finetuning and RL for the masses
The cherry on Yann LeCun’s cake has finally been realized.
Dec 11, 2024
•
Nathan Lambert
80
Interviewing Finbarr Timbers on the "We are So Back" Era of Reinforcement Learning
Listen now | Interconnects interview #11. An overview on the past, present, and future of RL.
Dec 5, 2024
•
Nathan Lambert
and
Finbarr Timbers
14
1:08:32
OpenAI's o1 using "search" was a PSYOP
How to understand OpenAI's o1 models as really just one wacky, wonderful, long chain of thought
Dec 4, 2024
•
Nathan Lambert
61
2
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts