A University of Oxford study developed a means of testing when language models are “unsure” of their output and risk hallucinating.
AI “hallucinations” refer to a phenomenon where large language models (LLMs) generate fluent and plausible responses that are not truthful or consistent.
Hallucinations are tough – if not impossible – to separate from AI models. AI developers like OpenAI, Google, and Anthropic have all admitted that hallucinations will likely remain a byproduct of interacting with AI.
As Dr. Sebastian Farquhar, one of the study’s authors, explains in a blog post, “LLMs are highly capable of saying the same thing in many different ways, which can make it difficult to tell when they are certain about an answer and when they are literally just making something up.”
The Cambridge Dictionary even added an AI-related definition to the word in 2023 and named it “Word of the Year.”
This University of Oxford study, published in Nature, seeks to answer how we can detect when those hallucinations are most likely to occur.
It introduces a concept called “semantic entropy,” which measures the uncertainty of an LLM’s output at the level of meaning rather than just the specific words or phrases used.
By computing the semantic entropy of an LLM’s responses, the researchers can estimate the model’s confidence in its outputs and identify instances when it’s likely to hallucinate.
Semantic entropy in LLMs explained
Semantic entropy, as defined by the study, measures the uncertainty or inconsistency in the meaning of an LLM’s responses. It helps detect when an LLM might be hallucinating or generating unreliable information.
In simpler terms, semantic entropy measures how “confused” an LLM’s output is.
The LLM will likely provide reliable information if the meaning of its outputs is closely related and consistent. But if the meanings are scattered and inconsistent, that’s a red flag that the LLM might be hallucinating or generating inaccurate information.
Here’s how it works:
- The researchers actively prompted the LLM to generate several possible responses to the same question. This is achieved by feeding the question to the LLM multiple times, each time with a different random seed or slight variation in the input.
- Semantic entropy examines responses and groups those with the same underlying meaning, even if they use different words or phrasing.
- If the LLM is confident about the answer, its responses should have similar meanings, resulting in a low semantic entropy score. This suggests that the LLM clearly and consistently understands the information.
- However, if the LLM is uncertain or confused, its responses will have a wider variety of meanings, some of which might be inconsistent or unrelated to the question. This results in a high semantic entropy score, indicating that the LLM may hallucinate or generate unreliable information.
To evaluate its effectiveness, the researchers applied semantic entropy to a diverse set of question-answering tasks. This involved benchmarks such as trivia questions, reading comprehension, word problems, and biographies.
Across the board, semantic entropy outperformed existing methods for detecting when an LLM was likely to generate an incorrect or inconsistent answer.
In the above diagram, you can see how some prompts push the LLM to generate a confabulated (inaccurate, hallucinatory) response. For example, it produces a day and month of birth for questions at the bottom of the diagram when the information required to answer them wasn’t provided in the initial information.
Implications of detecting hallucinations
This work can help explain hallucinations and make LLMs more reliable and trustworthy.
By providing a way to detect when an LLM is uncertain or prone to hallucination, semantic entropy paves the way for deploying these AI tools in high-stakes domains where factual accuracy is critical, like healthcare, law, and finance.
Erroneous results can have potentially catastrophic impacts when they influence high-stakes situations, as shown by some failed predictive policing and healthcare systems.
However, it’s also important to remember that hallucinations are just one type of error that LLMs can make.
As Dr. Farquhar explains, “If an LLM makes consistent mistakes, this new method won’t catch that. The most dangerous failures of AI come when a system does something bad but is confident and systematic. There is still a lot of work to do.”
Nevertheless, the Oxford team’s semantic entropy method represents a major step forward in our ability to understand and mitigate the limitations of AI language models.
Providing an objective means to detect them brings us closer to a future where we can harness AI’s potential while ensuring it remains a reliable and trustworthy tool in the service of humanity.