Subscribe
Sign in
Share this discussion
How RLHF works, part 2: A thin line between useful and lobotomized
www.interconnects.ai
Copy link
Facebook
Email
Note
Other
How RLHF works, part 2: A thin line between…
Nathan Lambert
May 1
23
Share this post
How RLHF works, part 2: A thin line between useful and lobotomized
www.interconnects.ai
Copy link
Facebook
Email
Note
Other
Many, many signs of life for preference fine-tuning beyond spoofing chat evaluation tools.
Read →
0 Comments
Share
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts
How RLHF works, part 2: A thin line between useful and lobotomized
How RLHF works, part 2: A thin line between…
How RLHF works, part 2: A thin line between useful and lobotomized
Many, many signs of life for preference fine-tuning beyond spoofing chat evaluation tools.