Original post:
Tülu 3: The next era in open post-training
Post-training, the craft of eliciting powerful behaviors from a raw pretrained language model, has gone through many seasons and moods since the release of ChatGPT. In the era of Alpaca, Vicuna, Koala, and Dolly, a limited number of human datapoints with extended synthetic data in the style of
Chapters
00:00 History
05:44 Technical details sneak peak
Figures
Fig 1, results: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/results.webp
Fig 2, overview: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/overview.webp
Fig 3, preferences: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/preferences.webp
Fig 4, RLVR: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/rlvr.webp
Share this post