6 Comments
May 4, 2023·edited May 4, 2023Liked by Nathan Lambert

I'm of the view that hallucination is actually pretty well understood in LLMs. Borrowing the argument from https://arxiv.org/abs/2303.08896:

When we measure LLM output token probabilities via beam search from a given prompt, factual sentences will contain tokens with higher likelihood and lower entropy, while hallucinations will come from positions with flat probability distributions with high uncertainty. So one solution is to measure the probability distribution of the tokens in the output, that will tell you the odds of hallucination. To do this, you need access to the raw probabilities of the result which I've heard the ChatGPT API no longer allows.

However, an even better way is run the same question through multiple times and then check how similar the results are by using another DNN model BERTScore (https://github.com/Tiiiger/bert_score). Hallucinations are a product of uncertainty and won't be repeated exactly the same way each time. Consistent results, on the hand, indicate the model is being true to its training data.

Expand full comment
May 4, 2023·edited May 4, 2023Liked by Nathan Lambert

Here's a class of hallucinations: if you ask a chatbot why it wrote what it did, it doesn't know, but it will invent a plausible answer anyway. More: https://skybrian.substack.com/p/ai-chatbots-dont-know-why-they-did

It seems like this is essentially speculation or brainstorming? Coming up with plausible ideas to try is often helpful, so long as the user is aware it's speculative and has their own way to check them. But there's little we can do with a guess about why a chatbot decided to write something; neither the chatbot nor the user have a way to verify it.

For generating an artistic image, autocomplete, and generating code, we generally assume the user will do the testing (by deciding if it's what they wanted, or by testing the code) and that's okay.

Under this framework, suggesting possible diagnoses to consider to a doctor seems perhaps not that harmful? Treating these guesses as verification would be.

Underlying this framework is an assumption that the user is a responsible person who can use speculation appropriately. This isn't always true, and in safety-critical situations, we assume that people will make mistakes and try to minimize them.

Expand full comment