With the emergence of reinforcement learning from human feedback, we've been applying old techniques with a new guiding function (🤫 RLHF).
Interesting how uncertainty compounds. How preferences complicate rl agent design?
Interesting how uncertainty compounds. How preferences complicate rl agent design?