With the emergence of reinforcement learning from human feedback, we've been applying old techniques with a new guiding function (🤫 RLHF).
The implicit dynamics of optimizing costs vs. rewards vs. preferences
The implicit dynamics of optimizing costs vs…
The implicit dynamics of optimizing costs vs. rewards vs. preferences
With the emergence of reinforcement learning from human feedback, we've been applying old techniques with a new guiding function (🤫 RLHF).