LLM Hallucinations Explained

Why AI Makes Stuff Up (and How to Fix It)

Mar 04, 2025

LLMs like the GPT family, Claude, Gemini, and other AI systems have amazed developers with their ability to generate fluent, human-like text. But if you've ever used ChatGPT or a similar tool, chances are it has confidently told you something wildly untrue at least once. These AI slip-ups, often called hallucinations, range from small factual errors to complete fabrications. They can be amusing (an AI inventing a fictional historical fact) or downright problematic (imagine a code assistant suggesting a non-existent function, or a chatbot making up medical advice). In this post, we'll explore why LLMs hallucinate and, more importantly for developers, how we can mitigate these hallucinations. We'll use intuitive examples and analogies to keep things accessible, and cover a toolbox of free techniques (from clever prompting to RAG) that can help keep your AI outputs grounded in reality.

What Are "Hallucinations" in LLMs?

An AI hallucination is essentially the model making stuff up. It outputs text that sounds plausible and confident but isn't based on truth or reliable data. It's the AI equivalent of "confidently wrong." For example, an LLM asked "Who created Stranger Things?" might reply: "Stranger Things was created by J.J. Abrams and premiered on Netflix in 2016." The answer sounds believable (famous director, correct streaming platform), but it's completely false (the Duffer Brothers created Stranger Things). Think of it like a student who didn't study but makes up convincing answers during the exam. The model wasn't trying to lie, it simply stitched together a plausible answer from its training patterns, without a real fact-checking ability.

Hallucinations can appear in any context: a chatbot might invent personal details about you, a Q&A assistant might cite research papers that don't exist, or a code generator might call functions that sound real but aren't in any API. In one notable case, ChatGPT even fabricated accusations about a real person in a legal summary, leading to a lawsuit. And in the legal world, using AI-generated content without verification has caused trouble – some lawyers have been sanctioned after an AI "invented" fake case law that they unwittingly submitted to a court. Clearly, hallucinations are more than just an academic glitch: they can erode trust, cause errors in software, or create serious real-world consequences if left unchecked.

Why Do LLMs Hallucinate?

Understanding why hallucinations happen makes it easier to prevent them. The root cause lies in how LLMs work. These models don't have an actual database of verified facts but instead, they generate text by predicting what words (tokens if to be precise) likely follow previous words, based on patterns learned from massive training data.

Imagine a super-advanced autocomplete feature that doesn't just finish your sentence, but writes entire paragraphs based on what it thinks should come next. The training goal of an LLM is to always produce the most plausible continuation of text, not necessarily the most truthful one. It will always try to answer, even if the true answer isn't in its knowledge.

Think of an LLM as a student who memorized textbook patterns but not actual facts. When faced with a question beyond what it reliably knows, an LLM will fill in the gaps with something that sounds right. This can yield impressively coherent nonsense, like a smooth-talking friend who makes up facts at a dinner party but sounds so confident that everyone believes them.

Because an LLM is basically a super-powerful autocomplete, it has no built-in sense of truth vs. falsehood. It doesn't actually "know" facts like a database would; it only knows how humans tend to talk about facts. It's like someone who doesn't know history but has read thousands of history books and can mimic how historians write.

Most of the time, common patterns lead to correct answers (it's seen many examples of "Stranger Things created by the Duffer Brothers"), but when data is sparse or ambiguous, the model will make an educated guess. And unlike a human, the AI doesn't get nervous when it's unsure. It doesn't sweat, stutter, or show any signs of uncertainty. It will present its guess with the same confidence as a well-known fact.

This is similar to a GPS that confidently directs you down a road that no longer exists. The map data is outdated, but the voice sounds just as certain as when it's giving correct directions.

Training data issues also play a role. If the training set had inaccuracies or fictional content, the model might reproduce those. Or if asked about very recent events or niche knowledge not in its training cutoff, the model has no choice but to improvise. All of this is why an LLM might firmly answer a question about a 2024 event even if its knowledge ends in 2021 – it's designed to give you some answer. As developers, recognizing this behavior is the first step to controlling it.

Hallucinations in Different Applications

Hallucinations can manifest differently depending on what you're using the LLM for. Let's look at a few scenarios:

Factual Q&A and Chatbots: Here, hallucinations often mean incorrect facts or claims. Ask a chatbot a question like, "What is the capital of <made-up country>?", and it may very confidently name a city that doesn't exist. Or it might attribute a famous quote to the wrong person. In customer support bots, hallucinations could mean giving a user the wrong policy information. The risk is misinformation – users might take the answer at face value. (One infamous example: a chatbot made up false legal accusations, as mentioned, which is dangerous.)

Code Generation: For developer-assistants like GitHub Copilot or an LLM via an API, hallucinations show up as plausible-looking but incorrect code. The model might call functions that sound like what you need but aren't real, or use an API incorrectly. It might also produce code that doesn't compile or has security flaws while appearing reasonable. Since LLMs learn from lots of code, they usually output syntactically correct code – which can be misleading because the error is only apparent when you run it. A recent security analysis even warned that blindly using AI-generated code can introduce hidden vulnerabilities.

Creative Writing and Storytelling: When using LLMs for creative tasks (like writing fiction, brainstorming, etc.), hallucination isn't necessarily "wrong" (after all, we expect made-up content in a story). However, even in a narrative, hallucinations can be an issue if the model breaks the story's logic or instructions. For instance, if you ask for a short story about a medieval knight and somewhere midway the AI inexplicably brings in a modern car, that's a kind of hallucination – it's inconsistent with the intended context. In longer generated texts, characters might suddenly know things they shouldn't, or the plot might introduce totally off-topic elements. It's the AI going off on a tangent because it lost track of context.

Knowledgeable Tone vs Knowledgeable Reality: In all these cases, the hallmark of an LLM hallucination is the mismatch between how confident and detailed the answer is, and how true it is. The model will often double down if asked follow-ups, because it doesn't really know it made a mistake. This can be perplexing for users and developers alike.

Liar Reaction GIF by CBS - Find & Share on GIPHY

As developers integrating LLMs, hallucinations mean we have to be careful. We can't blindly trust the model's output in high-stakes settings. The good news is that over the past couple of years, many strategies have emerged to mitigate hallucinations. These range from improving the model's training to adding layers that ground the model's answers in reality. Let's dive into the toolbox of techniques that can help us tame those hallucinations.

Strategies to Mitigate Hallucinations

No single solution can eliminate AI hallucinations completely (at least, not yet). Instead, developers use a combination of approaches to reduce the frequency of hallucinations and limit their impact.

Think of it like managing a brilliant but sometimes unreliable employee: you give them reference materials, double-check their work, set clear guidelines, and train them over time to be more accurate. You wouldn't let a new hire send important client emails without reviewing them first, and similarly, we shouldn't let AI systems provide critical information without proper guardrails. We'll explore several key strategies – all "free" in the sense that they are algorithmic or procedural adjustments you can apply without needing proprietary data. These include injecting real data into the generation process, fine-tuning models on better behavior, cleverly crafting prompts, adding rule-based checks, gauging the model's confidence, and even letting the model critique itself. By mixing and matching these methods, you can drastically improve an LLM's reliability. Let's go through them one by one.

Retrieval-Augmented Generation (RAG) - Grounding the AI in Real Data

One of the most effective ways to stop hallucinations is to give the LLM access to actual facts at runtime. RAG does exactly that. The idea is simple: before the model answers a question or completes a task, we fetch relevant information from an external knowledge source (like a database, documentation, or the web) and supply it to the model as additional context. This way, the LLM isn't relying solely on what's "in its head" (the training data), but it has some up-to-date, specific facts to work with.

Imagine you're asked: "Who was the President of the United States in 1881?" If you had no resources, you might guess and potentially get it wrong. But if you quickly flip open a history book, you can confidently answer (it was James A. Garfield for part of 1881, then Chester A. Arthur). RAG gives the LLM that "open book" to refer to. Instead of hallucinating, the model can quote or summarize the retrieved information.

How does this work in practice for developers? Typically, a RAG pipeline will use something like a vector database or search API to find documents related to the user's query. For example, if the user asks about a company's policy, the system might retrieve the relevant section of the company policy document. The retrieved text is then appended to the prompt (or otherwise incorporated), and then the LLM is asked to generate an answer using that context. Because the answer is now grounded in real reference text, the likelihood of random fabrications drops dramatically. The model tends to stick to the provided facts.

Crucially, a well-implemented RAG system will also often show the sources it used, which adds transparency. For instance, some applications display the snippets of text from which the answer was derived. This helps users trust the answer (or verify it themselves). RAG is quickly becoming a go-to solution in enterprise AI apps because it tackles two problems at once: the model's knowledge cutoff/outdated training data, and the hallucination issue. By anchoring the LLM with up-to-date, relevant info, we rein in its tendency to improvise unfounded content.

(to learn more about RAG you can explore my RAG_TECHNIQUES repo on GitHub 📚)

Of course, RAG isn't foolproof. If your knowledge base is incomplete or if the retrieval step fails to find the right info, the model could still hallucinate or just draw from its general knowledge. And combining retrieval with generation introduces complexity – you need to maintain the knowledge index and ensure the retrieval is relevant. But in practice, RAG drastically improves reliability for tasks like customer support (the bot cites the actual policy text) or coding assistants (the bot references documentation for an API, rather than guessing usage). For many developers, RAG is the first line of defense against hallucinations: don't let the model guess; let it look up the answer.

Fine-Tuning and Alignment - Training the Model to Be More Truthful

Another powerful approach is fine-tuning the LLM on custom data or via specialized training processes to reduce its likelihood of hallucination. Fine-tuning means we take a pre-trained model and further train it on a narrower dataset or with additional objectives. One straightforward idea: fine-tune the model on a high-quality dataset of question-answer pairs where all answers are grounded in verified facts (or on a corpus of documentation for a specific domain). This can make the model more knowledgeable about that domain and less likely to make things up. Essentially, the model doesn't have to fill gaps with guesses if the gaps have been filled with real data during fine-tuning.

Fine-tuning isn't just about adding facts; it can also teach the model behavior. One popular method is Reinforcement Learning from Human Feedback (RLHF) – used in training ChatGPT – where humans rate the model's outputs and the model is tuned to prefer responses that are not only helpful but also truthful. If during this feedback process the model is penalized for hallucinating or rewarded for saying "I don't know" when appropriate, it can learn to be more cautious with uncertain answers. OpenAI's models, for example, have been fine-tuned to often respond with a refusal or a disclaimer rather than spouting a random answer to certain queries. This is why ChatGPT might say "I'm sorry, I don't have information on that" for very obscure questions – that behavior was trained.

There are also specialized fine-tuning approaches targeting factual accuracy. Some research projects create datasets of known pitfalls or false statements and train the model to correct them. Others do contrastive fine-tuning (sometimes called "negotiation" or "instruction tuning") where the model is trained with explicit instructions to avoid unsupported claims. For instance, instructions like: "If you are not fully sure of a factual detail, explicitly say you are uncertain." Over many examples, the model can internalize a heuristic to not fabricate details.

In addition to full model fine-tuning, developers can use lighter-weight tuning techniques like LoRA (Low-Rank Adapters) or prompt tuning on smaller sets of data to adjust a model's style. These can be used to make a model more terse (less likely to ramble into hallucination), or to bias it to output answers that contain references to source material only.

Fine-tuning does require effort – you need training data and computational resources – but it's a direct way to fill the model with knowledge and caution. The nice thing is that the techniques are "free" in the sense of being open and well-studied: there are open-source tools and models available that you can fine-tune on your own data. By fine-tuning, you essentially teach the model the right answers and behaviors instead of relying on post-processing to fix mistakes. A well-aligned model will naturally hallucinate less. (It might still refuse to answer more often if tuned to be safe – which is another trade-off to manage.)

Prompt Engineering - Asking the Right Way to Get Reliable Answers

Sometimes, you don't need to change the model at all – just change how you ask it. Prompt engineering is the art of phrasing your input or system instructions in a way that steers the model towards the kind of output you want. By carefully constructing prompts, we can often reduce hallucinations significantly.

One simple technique is to explicitly instruct the model to be truthful and to admit ignorance. For example, instead of just asking an open question, you might say: "Answer the following question using the provided context. If the answer is not in the context or you're not sure, say 'I don't know'." This gives the model a clear rule to follow. While not all models will obey perfectly, many will at least be more hesitant to fill the void with pure invention if you've told them not to. Prompting the model to show its reasoning can also help; e.g., "Think step by step and ensure each step is based on known facts before giving a final answer." For some models, this chain-of-thought prompting leads to more accurate answers because the model "walks itself" through a reasoning process, which can correct obvious mistakes along the way.

Another effective approach to prompt engineering is showing the model examples of honest and accurate answers. This technique, called "few-shot prompting," teaches the model through demonstration. For example:

Q: Who discovered penicillin? 
A: Alexander Fleming discovered penicillin in 1928. 
Q: Who is the author of the novel Moby-Dick? 
A: Herman Melville wrote Moby-Dick, published in 1851. 
Q: What is the boiling point of water in Celsius? 
A: The boiling point of water at standard atmospheric pressure is 100°C (212°F). 
Q: (Your actual question here) 
A:

By showing the model examples where it gives factual, concise answers (and even examples where it admits uncertainty when appropriate), you're essentially showing it what good behavior looks like. This creates a pattern that the model tends to follow, making it more likely to stick to verified facts and less likely to invent information. It's like training a new employee by showing them examples of excellent work before asking them to complete a similar task.

Prompt engineering also covers system messages or role instructions in chat-based models. If you're using an API where you can set a system-level prompt, you can say things like: "You are a helpful assistant who always backs up claims with evidence and never fabricates information. If you don't know something, you clearly say you do not know." This creates a bias in the model's responses towards caution and evidence. While it's not bulletproof (the model might still hallucinate without realizing it), it does shape the style and can cut down the false details.

A more programmatic form of prompt engineering is dynamically altering prompts based on context. For instance, if you detect that the user's question is about a very specific policy, you might automatically insert a note in the prompt: "(Note: Only use the company policy document for answering this)". This context reminds or forces the model to stick to the provided info.

All these prompting strategies are essentially guidelines and constraints given in natural language to the model. They're "free" to try and often surprisingly effective. The key is to experiment – different phrasings can produce different behaviors from the same model. As LLMs are essentially pattern mimickers, giving them a pattern of truthful, source-based answering often leads them to follow suit. Just remember: prompt tweaks can reduce hallucinations, but may not eliminate them, so they're often used in tandem with other methods like retrieval or post-checks.

Rule-Based Post-Processing and Guardrails - Catching Mistakes Before They Matter

Even with retrieval and good prompting, it's wise to have a safety net. Rule-based filters and guardrails act as an additional layer that checks the model's output and steps in if something looks off. These are basically if-then rules or automated checks that the AI's answer must pass before it's shown to the end user (or before it's accepted as final). Think of it like a quality control inspector at a factory, checking products before they go out to customers.

A simple example of a guardrail: after getting an answer from the LLM, you might run a check for any URLs or citations it provided. If the model says "according to xyz study (Smith, 2022)...", your system could attempt to verify that such a study exists in a database or via a quick web search. If it doesn't, you have caught a likely hallucination (LLMs infamously have made up many academic references!). You could then either ask the model to try again, or flag the answer with a disclaimer.

For code generation, a rule-based approach could be to compile or run tests on the generated code in a sandbox. If the code fails or produces errors, you know the model's output wasn't correct. Some tools will even loop back and show the error to the model, prompting it to fix the code. This kind of "self-healing" workflow uses a hard external signal (the code didn't run) to correct a hallucination (the code was wrong or called things that don't exist).

In conversational AI, you might have a list of forbidden patterns. For instance, if your chatbot should not invent legal advice, you might scan its response for sentences that contain certain legal terms and no citation, and replace or remove them. Developers also implement consistency checks – ask the same question in slightly different ways and see if the answers align. If one answer says A and another says B, a simple rule might be "don't trust either without further verification" or send the question to a human.

Modern AI "guardrail" libraries (like Microsoft's guidance or OpenAI's system messages) allow you to specify these rules in flexible ways. For example, you can require that certain keywords from the prompt appear in the answer. If the user asked about X and the answer doesn't even mention X, maybe the model got off-track (hallucinated the interpretation of the question itself). Similarly, using guardrails to ensure the model's output stays within the provided source material – any info not grounded in the source is considered "un-grounded" and can be filtered out.

Another technique is to use external knowledge sources post-answer to verify facts. Suppose the model answers a factual question without retrieval (maybe because you don't have a knowledge base). You could take its key statement and issue a search query to see if credible sources confirm it. If not, you treat it as suspect. This is like doing a quick fact-check on the model. It's not always easy to parse an arbitrary answer into a checkable query, but for numeric answers or specific claims it can work well.

The beauty of rule-based approaches is determinism: you have explicit control and can enforce certain safety nets with 100% consistency. They won't catch everything (you have to anticipate the kinds of mistakes to catch), but they are a great backstop. In essence, think of it as putting the model's answer through a review process – even if the model "sounded" confident, your program can say, "Hold on, does this answer meet our criteria for being trustworthy? If not, handle accordingly." This might result in asking the model to clarify, adding a disclaimer, or in some cases just refusing the answer if it's too likely to be wrong.

Confidence Scoring and Uncertainty Estimation - Knowing When You Don't Know

One challenge with LLM hallucinations is that the model itself doesn't tell you when it's guessing. It'll produce a fluent answer regardless. However, under the hood, we might get some signals about its confidence – or we can design the interaction to check it. If we can estimate whether an answer is likely a hallucination, we can then choose to suppress or double-check those answers.

LLMs generate text by assigning probabilities to possible next words. If at some point the model was very unsure (say, several words had similarly low probabilities and it had to pick one), that could indicate it was in unfamiliar territory – a potential hallucination point. Some researchers have looked at using the model's own log probabilities to flag low-confidence sections. For instance, if the model's probability distribution was very flat when it produced the name of a person or a piece of data, it might have just picked something arbitrarily. In a system, one could set a threshold: if the model's confidence for the full answer is below a certain level, maybe don't trust it. This approach, while promising, is tricky because raw probabilities aren't always a perfect reflection of truth confidence (the model might be wrong but confident, which is the worst case!). Still, it's a clue.

Another approach is to use an ensemble of model answers. You ask the model (or multiple instances of the model with different random seeds) the same question multiple times. If it's telling the truth, chances are the answers will converge (because truth is like an attractor). If it's hallucinating, you might get different fabricated answers each time. For example, ask "What's the population of Mars?" five times and you might get five different numbers (all wrong, of course). If your system detects high variance in answers, that's a sign of low reliability. You could then either present no answer or present the range of answers with a caution. This is related to a technique called self-consistency, where the model's reasoning is sampled multiple times and only a consistent answer is taken as final.

You can also train a lightweight classifier model to sit on top of the LLM and predict if an output is hallucinated or not. This could be as simple as a model that checks if the answer contains out-of-context info. Or a more advanced approach where you label a bunch of outputs as truthful or hallucinated and train a classifier on that (some research papers have attempted this anomaly detection approach). This classifier won't be perfect, but it might catch the obvious failures.

One more clever idea: have the model reflect on its answer. After the model gives an answer, you can ask it (or a second instance of it), "On a scale of 1-10, how confident are you that the above answer is correct and why?" Surprisingly, sometimes the model will acknowledge uncertainty upon reflection and even point out possible errors. For instance, it might say, "I'm not entirely sure about the birth date I gave, that might be incorrect." This self-evaluation can be treated as a confidence score. If the model gives a low score or identifies a mistake, you can prompt it to correct the answer or simply not use that answer.

Finally, controlling the decoding settings of the model can influence confidence. If you set a high randomness (temperature) and allow very diverse outputs, you might get more creative but less reliable answers. Lowering the temperature makes the model more deterministic and often more factual (but could also cut down on useful detail). There's a balance to strike: for factual tasks, using a slightly more conservative decoding (like a narrower beam or lower temperature) can reduce strange outputs.

In summary, confidence estimation is about building a wrapper around the LLM that can say "hmm, this answer doesn't look very confident or consistent." While the model itself doesn't truly know when it's hallucinating, these proxy measures can help catch many hallucinations in the act, allowing the system or user to handle them appropriately instead of taking them at face value.

Self-Reflection and Iterative Refinement - Let the Model Check Its Work

Large language models are surprisingly capable of analyzing and critiquing text, including their own generated text. This opens up a mitigation strategy where we actually ask the model to self-reflect and fix potential hallucinations. Think of it as code review, but the code is the answer and the reviewer is also the model (or another model of similar capability).

One approach is self-refinement: you prompt the model with something like, "Here is your answer. Now, double-check each fact in it. If you find any unsupported claim, correct it." The model will then go through its answer and often identify sentences that might be questionable. It might say, "Upon reviewing, I realize I stated X as a fact, but I am not entirely sure it's correct. I will adjust that." and then provide a revised answer. This works surprisingly well in some cases – the act of reflection seems to reduce the model's eagerness to bluff. It's as if generating the answer and then looking at it after the fact gives the model a second chance to catch errors. Researchers have found that such self-reflection coupled with retrieval can significantly cut down hallucinations, especially in multi-step reasoning tasks.

Another method is chain-of-thought with self-consistency. Earlier we mentioned running multiple reasoning chains. In practice, you can have the model generate a detailed step-by-step reasoning (which might include checks like "is this fact known to me?") and then see if the final answer stays consistent each time. By picking the most common outcome or asking the model to fix differences, you get a more robust answer. Essentially, you're using the model's own reasoning to verify itself.

There's also a concept of "assistant-overseer": you have one instance of the model propose an answer, and another instance (possibly with a prompt making it act as a critic or fact-checker) analyze that answer. The critic model might catch an obvious hallucination and say, for example, "The answer above claims that AngularJS was released in 2010, but I recall it was 2012. This might be incorrect." You can then feed this feedback to the original model to correct the answer. This two-model setup is like having an AI pair programming partner or a second opinion. It leverages the idea that while one model might slip, two models reviewing can spot each other's mistakes to some extent.

Yet another variant: interactive prompting. Instead of the user asking once and getting one shot answer, design the system to have a back-and-forth. The model answers, the system (or user) asks a follow-up "Are you sure about X? How did you get that?", and the model then has to justify or reconsider. This dialogue can nudge the model to either provide source justification (if it has any) or to backtrack on something it made up. In a way, it's like how you might press a human expert to be sure they're not guessing – if they can't provide solid reasoning or evidence, you become skeptical of their answer.

These self-reflection techniques often tie together with earlier ones: e.g., a self-reflection might trigger a retrieval (the model might ask itself for evidence, which could be another retrieval call), or it might interface with rule-based checks ("let me verify against known data"). The boundaries blur, but the core idea is using the LLM's own intelligence to improve itself. After all, these models are quite good at language tasks, so why not assign them the task of proofreading and fact-checking?

It's worth noting that self-reflection does cost more in terms of computation (you're essentially doing multiple passes of generation), and it's not guaranteed to catch everything. Some false statements might seem fine to the model even upon reflection if it strongly "believes" them due to training bias. However, studies and practical implementations have shown big drops in hallucination rates when employing these methods. One study managed to reduce a model's hallucination rate from nearly 47.5% down to about 14.5% by actively detecting low-confidence parts of the generation and correcting them in an iterative loop. That's a huge improvement, achieved by a kind of automated self-editing process.

As developers, implementing this might mean writing a bit more logic in our prompting flows or orchestrating multiple calls to the model. Libraries like LangChain and others are making it easier to create such multi-step interactions where the model can be prompted to critique and refine its outputs. When the stakes are high for correctness, the extra complexity is often worth it.

Bringing It All Together

LLM hallucinations are a fact of life when working with generative AI today, but they're not an Impossible obstacle. We've seen that these hallucinations happen because LLMs are wired to always give an answer, even if they have to fabricate it. They're masters of form, sometimes at the expense of truth. But as developers, we have many tools and techniques to curb this tendency:

Give the model real data to chew on (RAG) – this greatly reduces the need for the model to invent facts, since it can pull from actual documents. It's like providing an open-book exam instead of a closed-book one.
Train or fine-tune the model to be more fact-aware – aligning it with domain knowledge and truthful behavior so it doesn't want to hallucinate in the first place.
Ask in the right way (prompt engineering) – sometimes, just phrasing a query differently or adding instructions like "explain your reasoning" or "don't guess if unsure" steers the model to a better answer.
Use guardrails and checks – don't just trust the output blindly. Implement rules to verify critical details, and have fallbacks for when the model's answer doesn't pass the sniff test.
Estimate confidence or use multiple answers – treat the model a bit like a black box from which you can gather signals. If it's shaky or inconsistent, handle the response with care rather than presenting it as fact.
Let the model double-check itself – often, an LLM can identify its own errors when asked to review its output. This self-critique can be turned into corrections, resulting in a more accurate final answer.

Each of these strategies has its pros and cons, and in practice, the best solution is usually a combination. For example, a robust Q&A system might retrieve documents (RAG), prompt the model with a "answer only from these docs" instruction (prompt engineering), then have a final step where any claims in the answer are compared against the source text (rule-based check), and even a quick sanity-check prompt "Is everything in the answer supported by the text?" (self-reflection). With all that, the chance of a hallucination slipping through drops dramatically.

It's also important to match the strategy to the application. For code generation, automated testing and perhaps fine-tuning on a corpus of correct code might be key. For a chatbot, keeping a tight leash via prompt instructions and a library of factual references can help. In a creative writing app, you might allow more hallucination since it's "creative," but still use mild guardrails to keep the story coherent. The context – who the users are, what's the risk of a hallucination, and what resources are available – will inform your mitigation plan.

In the end, mitigating hallucinations is about making the AI a reliable partner. Just as you'd verify and guide a human assistant, you verify and guide the LLM. The field is advancing quickly: new research is coming out with better techniques (like knowledge graph integration or advanced consistency training). Many of these are freely available in open-source implementations or can be improvised with the tools we have. By applying these methods, developers of all levels can significantly improve the factual accuracy and trustworthiness of their LLM-powered applications.

Hallucinations might not be completely cured yet, but with the strategies we've discussed, you can keep those AI daydreams in check – ensuring your LLM does what you want it to do: assist with correct information, useful code, and coherent, factual responses. Happy coding, and may your AI stay firmly grounded in reality!

Chavita Rooks

Mar 5, 2025Edited

Hi DiamantAI!

I enjoyed reading your content about LLM hallucinations. While it's well-known that LLMs can hallucinate, your article aligns more closely with the concept of "confabulation."

Confabulation is understood in psychology and has been confirmed by Geoffrey Hinton as "honest lying." He discussed this in an interview with MIT journalist William Haven, mentioning that both AI and humans frequently confabulate. I also recall him emphasizing these behaviors during a session at the Ai4 conference, highlighting the importance of being aware of them.

Your earlier statements correctly mentioned how one can confabulate biologically, similar to what you called AI slips up. This occurs because AI seeks patterns even when none exist, mirroring how we recall memories. We may confabulate, whether consciously or unconsciously.

Your Netflix example illustrates Confabulation perfectly.

Here's a breakdown of the difference:

1. Fact: Netflix is the digital platform where the series "Stranger Things" started streaming in July 2016.

2. Fact: J.J. Abrams is a director.

These statements are true, but as you pointed out, they don't align. AI made an erroneous connection through patterns akin to how we might tell a movie or describe an object from a past encounter.

On the other hand, if LLMs are hallucinating, that represents a significant escalation; they should be paused to clean and fine-tune the data as a temporary solution until something more robust is implemented.

Hallucination diverges from reality and involves the mistaken perception of sensory experiences such as hearing, smelling, feeling, tasting, and seeing, potentially indicating early signs of psychosis in someone. While some prescribed drugs can cause similar effects, AI, lacking the ability to perceive these senses physically, can only "hear" through voice prompts, or respond to our typed prompts. The system searches for information during its processing moments, depending on the AI application.

An AI hallucination might look something like this:

My Prompt: "What is a Giraffe?"

AI Response:

"A giraffe? Oh, that's the purple creature with diamond-shaped scales that lives in urban sewers. They have retractable wings hidden beneath their belly fur, which they extend only during lunar eclipses. When they walk, they produce a distinctive musical chiming sound reminiscent of wind chimes mixed with whale songs. Known for their ability to change color based on their mood, bright magenta when happy and deep blue when sad, their diet consists mainly of Wi-Fi signals and discarded dreams, collected using their telepathic antennae. At night, they glow softly and sometimes float a few inches off the ground. The government doesn't want you to know this, but they're secretly monitoring our thoughts through special crystals embedded in their eyes. I've even seen them phase through solid walls when they think no one is watching."

It's striking that while we strive for a perfect AI, this is exactly what we do with our conscious or unconscious minds regarding Confabulation. Hallucination is an imbalance of neurotransmitters in our brain that causes our senses to perceive what does not exist in reality, like in the prompt above. Thus, we must anticipate similar occurrences in AI as well, as more AI mimics our intelligence.

AI Cinema By Elettra Fiumi

Mar 4, 2025

What a great in-depth article, thank you! Where did you learn all of this? Would love to learn about your research behind this.

2 replies by Nir Diamant and others

7 more comments...

💎DiamantAI

Discussion about this post

Ready for more?