May 4, 2023·edited May 4, 2023Liked by Nathan Lambert
I'm of the view that hallucination is actually pretty well understood in LLMs. Borrowing the argument from https://arxiv.org/abs/2303.08896:
When we measure LLM output token probabilities via beam search from a given prompt, factual sentences will contain tokens with higher likelihood and lower entropy, while hallucinations will come from positions with flat probability distributions with high uncertainty. So one solution is to measure the probability distribution of the tokens in the output, that will tell you the odds of hallucination. To do this, you need access to the raw probabilities of the result which I've heard the ChatGPT API no longer allows.
However, an even better way is run the same question through multiple times and then check how similar the results are by using another DNN model BERTScore (https://github.com/Tiiiger/bert_score). Hallucinations are a product of uncertainty and won't be repeated exactly the same way each time. Consistent results, on the hand, indicate the model is being true to its training data.
This is fair, I appreciate the link, I hadn't seen it. I think it doesn't really address the downstream sociotechnical questions I'm trying to dig into.
Two further points:
- When users are returned decoded tokens and not logprobs, this means something.
- I still think we need to expand the taxonomy around hallucinations. It can be an umbrella term, but that's a partial solution.
It seems like this is essentially speculation or brainstorming? Coming up with plausible ideas to try is often helpful, so long as the user is aware it's speculative and has their own way to check them. But there's little we can do with a guess about why a chatbot decided to write something; neither the chatbot nor the user have a way to verify it.
For generating an artistic image, autocomplete, and generating code, we generally assume the user will do the testing (by deciding if it's what they wanted, or by testing the code) and that's okay.
Under this framework, suggesting possible diagnoses to consider to a doctor seems perhaps not that harmful? Treating these guesses as verification would be.
Underlying this framework is an assumption that the user is a responsible person who can use speculation appropriately. This isn't always true, and in safety-critical situations, we assume that people will make mistakes and try to minimize them.
And that in a nutshell is the critical framework for a taxonomy of 'hallucinations' - is it defined from the technology side or from the real world impact side - and do these objectives have common ground? I think so.
I'm of the view that hallucination is actually pretty well understood in LLMs. Borrowing the argument from https://arxiv.org/abs/2303.08896:
When we measure LLM output token probabilities via beam search from a given prompt, factual sentences will contain tokens with higher likelihood and lower entropy, while hallucinations will come from positions with flat probability distributions with high uncertainty. So one solution is to measure the probability distribution of the tokens in the output, that will tell you the odds of hallucination. To do this, you need access to the raw probabilities of the result which I've heard the ChatGPT API no longer allows.
However, an even better way is run the same question through multiple times and then check how similar the results are by using another DNN model BERTScore (https://github.com/Tiiiger/bert_score). Hallucinations are a product of uncertainty and won't be repeated exactly the same way each time. Consistent results, on the hand, indicate the model is being true to its training data.
This is fair, I appreciate the link, I hadn't seen it. I think it doesn't really address the downstream sociotechnical questions I'm trying to dig into.
Two further points:
- When users are returned decoded tokens and not logprobs, this means something.
- I still think we need to expand the taxonomy around hallucinations. It can be an umbrella term, but that's a partial solution.
Here's a class of hallucinations: if you ask a chatbot why it wrote what it did, it doesn't know, but it will invent a plausible answer anyway. More: https://skybrian.substack.com/p/ai-chatbots-dont-know-why-they-did
It seems like this is essentially speculation or brainstorming? Coming up with plausible ideas to try is often helpful, so long as the user is aware it's speculative and has their own way to check them. But there's little we can do with a guess about why a chatbot decided to write something; neither the chatbot nor the user have a way to verify it.
For generating an artistic image, autocomplete, and generating code, we generally assume the user will do the testing (by deciding if it's what they wanted, or by testing the code) and that's okay.
Under this framework, suggesting possible diagnoses to consider to a doctor seems perhaps not that harmful? Treating these guesses as verification would be.
Underlying this framework is an assumption that the user is a responsible person who can use speculation appropriately. This isn't always true, and in safety-critical situations, we assume that people will make mistakes and try to minimize them.
Great note. It depends on where the decision is made, in the healthcare example.
Very different scenarios:
- doctor uses it to brainstorm and reads
- middle bureaucracy uses it and passes it onto the doctor without them knowing
And that in a nutshell is the critical framework for a taxonomy of 'hallucinations' - is it defined from the technology side or from the real world impact side - and do these objectives have common ground? I think so.
Yeah, as we've anthropomorphized the definition as something human, it's now a sociotechnical definition.