The rewards assigned by these models will control the subjective experience of users; the subjective experience controls their decision-making.
This reminds me of "revealed preferences" in economics, which are inferred from what people do. It sounds like you're saying that revealed preferences aren't the kind of preferences you're interested in?
What's a better way to think about preferences?
Something I've been curious about: If you have a pretrained LLM and then some domain specific data, are you better off finetuning the LLM with SGD or use a reward model to refine it? For example, say that you wanted to generate news stories about sports and had some examples. Would you fine tune it with those stories, or train a reward model with the sports stories vs non-sports stories to refine the model. I'm not sure what the best practice is here.
The company operating the Fireside Chat form does a shady marketing tactic in it: After one enrolls for the lecture, auto-subscribes the applicant to their newsletter, asks to *mail them back* if one wants to unsubscribe, even though the mailing list wasn't mentioned anywhere in the event form:
You've been added to Rora's newsletter list where you'll receive information on relevant events and industry updates. If you're not interested in being on this list, please email firstname.lastname@example.org.
Also, your name is (mis?)spelled as Michael in the form questions.