Why the Bots Hallucinate – and Why It's Not an Easy Fix

It’s a common lament: “I asked ChatGPT for scientific references, and it returned the names of non-existent papers.”

How and why does this happen? Why would large language models (LLMs) such as ChatGPT create fake information rather than admitting they don’t know the answer?

And why is this such a complex problem to solve?

LLMs are an increasingly common presence in our digital lives. (Less sophisticated chatbots do exist, but for simplification, I’ll refer to LLMs as “chatbots” in the rest of the post.) These AI-driven entities rely on complex algorithms to generate responses based on their training data. In this blog post, we will explore the world of chatbot responses and their constraints. Hopefully, this will shed some light on why they sometimes "hallucinate."

How do chatbots work?

Chatbots such as ChatGPT are designed to engage in conversational interactions with users. They are trained on large amounts of text data to understand language patterns and generate coherent responses. For this, they use natural language processing (NLP) techniques. These techniques are enabled by machine learning algorithms. Let’s take a closer look at what that means.

What are natural language processing (NLP) techniques?

Here are a few NLP techniques commonly used by chatbots:

Tokenization is the process of breaking down a sentence or text into smaller units called tokens, like words or subwords. Named Entity Recognition helps identify specific named entities. This can include names, places, organizations, and dates. Part-of-Speech Tagging assigns grammatical tags to words, indicating their roles in a sentence. Sentiment Analysis determines the emotional tone expressed in a user's input. This can be positive, negative, or neutral.

These techniques help chatbots understand structure, extract relevant information, analyze syntax, and adapt responses based on the user's emotional context.

What are machine learning algorithms?

To enable the NLP techniques mentioned above, chatbots use various machine learning algorithms. These are mathematical formulas or procedures that enable computers to learn from data and make predictions or decisions without being explicitly programmed.

One common algorithm is the "sequence-to-sequence" model, which consists of two main parts: an encoder and a decoder. The encoder converts the user's input into a numerical representation that the machine can understand. The decoder then uses this representation to generate a response.

Another algorithm used is the "recurrent neural network." This algorithm is good at understanding and generating sequences of words. It learns patterns and associations in the training data to make predictions about what the next word should be based on the context.

The training phase

During the training phase, chatbots are exposed to large datasets comprising human-generated text. These datasets are sourced from various repositories. These could include websites, books, social media platforms, and other text-based sources. The collected data forms a training corpus, which serves as the basis for teaching the chatbot to understand and respond to user queries.

The chatbot's algorithms analyze the language patterns, semantics, and contextual cues in the training data. This helps it learn the relationships between words, grammar, syntax, and semantics. By recognizing patterns and associations between user queries and appropriate responses, the chatbot learns to generate contextually relevant replies.

Chatbot response generation works through statistical analysis. The bot analyzes the frequency and co-occurrence of words, phrases, and linguistic structures in the training data. The model learns to assign probabilities to different words or sequences of words based on these observed patterns. This enables the chatbot to generate appropriate responses. It selects the most likely words or sequences of words based on the given context.

Neural networks are commonly used in training chatbots. These models excel at sequence-to-sequence learning. This is essential for understanding and generating conversational responses.

The training process involves feeding the training corpus to the model. This allows it to learn to predict the next word or sequence of words based on the given context. By making minor improvements over time, the model gets better. Little by little, it learns to generate responses that make sense and fit the conversation.

A woman hallucinating shapes floating in front of her

Why do they 'hallucinate'?

During the training phase, chatbots are exposed to large datasets that cover a wide range of topics. This exposure enables them to handle diverse user queries. However, the quality and composition of the training data shape the chatbot's behavior and the accuracy of its responses. Incomplete representations in the data can lead to a limited understanding of certain topics.

Once deployed, chatbots interact with users in real-time. When a user submits a query, the chatbot processes the input. It applies its trained statistical model to generate a response. However, chatbot responses are not generated based on real-time analysis or any real understanding of the user's intent or the world's current state. Instead, they rely on the patterns and associations learned during training on historical data. For this reason, chatbots may struggle with context-specific queries or information that falls outside their training data.

They can also face difficulties with ambiguous queries or inputs that deviate from the patterns in the training data. In such cases, they may provide generic or nonsensical responses.

So, when asked for scientific references, for example, the language model uses statistical inference to generate responses. It tries to produce relevant and plausible references based on its training, even if those specific references do not exist.

In other words, the model might generate references that sound credible because it has learned the structure and patterns of scientific papers. It doesn't explicitly understand the actual existence or content of specific papers.

Implications for users

To navigate the limitations of chatbot responses, it's better to approach them as tools rather than sources of information. Yes, chatbots can help with basic inquiries and tasks. But complex or critical matters may need human verification and independent research.

AI researchers and developers are working on refining chatbot responses. They use techniques such as fine-tuning models, incorporating feedback mechanisms, and addressing biases in training data. With these, they aim to enhance chatbots’ accuracy, relevance, and contextual understanding.

OpenAI, the team behind ChatGPT, has expressed their optimism that with time, the model will be able to use information from the internet and its own deductive reasoning to discern truth from falsehood. Apparently GPT-4 has already shown much improvement in this regard.

Chatbots for Lead Generation: How to harness AI to capture leads

What is lead generation? Lead generation is the process of identifying and cultivating potential customers or clients. A “lead” is a potential customer who has shown some interest in your product or service. The idea is to turn leads into customers. Businesses generate leads through marketing efforts like email campaigns or social media ads. Once you have identified one, your business can follow up with them. You can provide information, answer questions, and convert them into a customer. The use of chatbots for lead generation has become popular over the last decade. But recent advancements in artificial intelligence (AI) mean chatbots have become even more effective. This post will explore artificial intelligence lead generation: its uses and methods. We’ll specifically look at a chatbot that has been drawing a lot of attention: ChatGPT . What is ChatGPT? ChatGPT is a so-called “large language model.” This type of artificial intelligence system ...

The Machine Mindset

Search This Blog

Choose your Champion! Task-Specific vs. General Models