em360tech image

As organizations across industries continue to explore generative AI use cases and bring the technology into workflows at every level, conversations about its inherent challenges are also growing. Hallucination, in which a model returns a partial or total fabrication, is one such challenge which is likely to cut down some high-risk use cases at inception. While this is justified in many instances, the trouble many face with regard to AI hallucination is that the concept is not fully understood.

To start with, hallucinations are an inherent feature of generative models, as opposed to a bug. Models generate their outputs stochastically. A hallucination is what we call an output that is inaccurate, but it is the result of an approximation of an ‘answer’ that is drawn from the information available.

In the case of conversational and generative AI use cases where accurate answers to text-based questions are required, i.e. language models, hallucinations are, of course, a problem. For these use cases, hallucinations may occur when a model has not been trained on enough relevant data to provide an accurate response to a prompt. This may be down to a poor prompt from a user, or it may be the case that the context window—i.e. the amount of information surfaced by the model from a user’s prompt—is not large enough to deliver an accurate result, leading the model to approximate an answer based on the information available. The severity of the hallucination in this case would be based on how plausible the answer is.

Even sophisticated LLMs, which have been trained on enormous amounts of data, can hallucinate when it comes to providing answers to fairly basic questions. These can range from minor inaccuracies and misrepresentations to total fabrications. The extent will often depend on the nature of the prompt, but there are means of forcing a self-correction with further prompts, such as asking ChatGPT to show its workings or cite specific research or articles that support its original output.

When it comes to certain text-based use cases, such as a customer services chatbot, hallucinations present a risk as they may provide a customer with completely erroneous information or misdirection. However, in other contexts, such as text-to-image generative AI models, hallucinations can still present users with viable creative options. For example, asking a model to generate an image of Finastra employees celebrating for a recruitment ad. Not every detail is specified e.g. location, clothing, etc. and this is an opportunity for the model to be creative. If the result needs to be refined, further prompts can be used.

When it comes to creative text-to-image use cases, this may not be such a problem, but there are obvious issues when it comes to information retrieval.

Why Do Hallucinations Occur?

Hallucinations can cause problems in a number of contexts and are a key factor in establishing the viability of use cases. In healthcare or financial services, for example, the viability of generative AI use cases can be affected by biased training data that does not adequately reflect the diversity of a population and the contexts of vulnerable or typically underserved communities. In the context of a chatbot, for example, the inability to consider the unique characteristics and context of a customer or patient could expose them to risks, which is why further mitigation strategies and tools will need to be put in place. This bias could be due to a lack of data on an underserved community, causing the model to return information that is skewed towards the general context of the dominant group or completely make something up.

How To Reduce Hallucinations?

It’s unlikely that we will ever fully overcome hallucinations. They are a feature of LLMs because LLMs generate output stochastically, which also has an impact on the explainability of generative AI outputs. While there are methods and tools that can be used to reduce instances of hallucinations, they are a necessary part of the package with any use case that harnesses the power of LLMs.

Prompt engineering is a growing field that will increasingly address the problem of hallucinations as refining inputs is an effective approach to delivering more accurate outputs. Many generative AI models also have a user feedback feature that lets users rank the output and provide more detailed feedback, which is then used to train the model further and reduce the likelihood of future hallucinations. This is particularly useful for hallucinations known as ‘confabulation’, which is when LLMs provide different answers to exactly the same prompt.

Research and development in AI is growing rapidly by the day, along with wider investment in the field, so it’s likely that we will continue to see new ways of reducing the instances of hallucinations and mitigation techniques and tools. Aside from the technological approach, it’s clear that reducing hallucinations requires education and upskilling for those who seek to operationalize generative AI in a significant way. We are now in an era where every single person within an organization can add value to their workflows using generative AI. This is why establishing best practices around adoption and being aware of the issues inherent in the technology, such as hallucinations, is essential.