Discussion about this post

User's avatar
skybrian's avatar

This reminds me of "revealed preferences" in economics, which are inferred from what people do. It sounds like you're saying that revealed preferences aren't the kind of preferences you're interested in?

What's a better way to think about preferences?

Expand full comment
K. Liam Smith's avatar

Something I've been curious about: If you have a pretrained LLM and then some domain specific data, are you better off finetuning the LLM with SGD or use a reward model to refine it? For example, say that you wanted to generate news stories about sports and had some examples. Would you fine tune it with those stories, or train a reward model with the sports stories vs non-sports stories to refine the model. I'm not sure what the best practice is here.

Expand full comment
7 more comments...

No posts