Interconnects
Interconnects
RLHF: A thin line between useful and lobotomized
0:00
Current time: 0:00 / Total time: -13:08
-13:08

RLHF: A thin line between useful and lobotomized

Many, many signs of life for preference fine-tuning beyond spoofing chat evaluation tools.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/how-rlhf-works-2

00:00 How RLHF works, part 2: A thin line between useful and lobotomized
04:27 The chattiness paradox
08:09 The mechanism for making models chattier
10:42 Next steps for RLHF research

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf/img_012.webp
Fig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf/img_018.png
Fig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/rlhf/img_025.png

Interconnects
Interconnects
Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories.