Subscribe
Sign in
Home
Podcast
Navigation
Archive
About
RLHF
Latest
Top
Discussions
Tülu 3: The next era in open post-training
We give you open-source, frontier-model post-training.
Nov 21
•
Nathan Lambert
55
Share this post
Interconnects
Tülu 3: The next era in open post-training
Copy link
Facebook
Email
Notes
More
3
Reverse engineering OpenAI’s o1
What productionizing test-time compute shows us about the future of AI. Exploration has landed in language model training.
Sep 16
•
Nathan Lambert
115
Share this post
Interconnects
Reverse engineering OpenAI’s o1
Copy link
Facebook
Email
Notes
More
2
Futures of the data foundry business model
Scale AI’s future versus further scaling of language model performance. How Nvidia may take all the margins from the data market, too.
Sep 11
•
Nathan Lambert
26
Share this post
Interconnects
Futures of the data foundry business model
Copy link
Facebook
Email
Notes
More
A post-training approach to AI regulation with Model Specs
And why the concept of mandating “model spec’s” could be a good start.
Sep 9
•
Nathan Lambert
21
Share this post
Interconnects
A post-training approach to AI regulation with Model Specs
Copy link
Facebook
Email
Notes
More
A recipe for frontier model post-training
Apple, Meta, and Nvidia all agree — synthetic data, iterative training, human preference labels, and lots of filtering.
Aug 7
•
Nathan Lambert
55
Share this post
Interconnects
A recipe for frontier model post-training
Copy link
Facebook
Email
Notes
More
RLHF roundup: Getting good at PPO, sketching RLHF’s impact, RewardBench retrospective, and a reward model competition
Things to be aware of if you work on language model fine-tuning.
Jun 26
•
Nathan Lambert
17
Share this post
Interconnects
RLHF roundup: Getting good at PPO, sketching RLHF’s impact, RewardBench retrospective, and a reward model competition
Copy link
Facebook
Email
Notes
More
OpenAI’s Model (behavior) Spec, RLHF transparency, personalization questions
Now we will have some grounding for when weird ChatGPT behaviors are intended or side-effects — shrinking the Overton window of RLHF bugs.
May 10
•
Nathan Lambert
18
Share this post
Interconnects
OpenAI’s Model (behavior) Spec, RLHF transparency, personalization questions
Copy link
Facebook
Email
Notes
More
How RLHF works, part 2: A thin line between useful and lobotomized
Many, many signs of life for preference fine-tuning beyond spoofing chat evaluation tools.
May 1
•
Nathan Lambert
27
Share this post
Interconnects
How RLHF works, part 2: A thin line between useful and lobotomized
Copy link
Facebook
Email
Notes
More
Stop "reinventing" everything to solve alignment
Integrating some non-computing science into reinforcement learning from human feedback (RLHF) can give us the models we want.
Apr 17
•
Nathan Lambert
15
Share this post
Interconnects
Stop "reinventing" everything to solve alignment
Copy link
Facebook
Email
Notes
More
Why reward models are key for alignment
In an era dominated by direct preference optimization and LLM-as-a-judge, why do we still need a model to output only a scalar reward?
Feb 14
•
Nathan Lambert
15
Share this post
Interconnects
Why reward models are key for alignment
Copy link
Facebook
Email
Notes
More
4
Alignment-as-a-service: Scale AI vs. the new guys
Scale’s making over $750 million per year selling data for RLHF, who’s coming to take it?
Feb 7
•
Nathan Lambert
28
Share this post
Interconnects
Alignment-as-a-service: Scale AI vs. the new guys
Copy link
Facebook
Email
Notes
More
RLHF learning resources in 2024
A list for beginners and wannabe experts and everyone in between.
Jan 12
•
Nathan Lambert
63
Share this post
Interconnects
RLHF learning resources in 2024
Copy link
Facebook
Email
Notes
More
Share
Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts