OpenAI Admits That Its New Model Still Hallucinates More Than a Third of the Time

OpenAI's latest large language model, GPT-4.5, reportedly hallucinates 37% of the time, meaning it generates false information while presenting it as fact. This issue is not unique to GPT-4.5, as other OpenAI models like GPT-4o and o3-mini have even higher hallucination rates of 61.8% and 80.3%, respectively. Despite these high rates of inaccuracy, OpenAI is attempting to frame the reduced hallucination rate of GPT-4.5 as a positive development compared to its predecessors. The broader AI industry faces similar challenges, with even the best models producing hallucination-free text only about 35% of the time, according to research by Wenting Zhao from Cornell.

The persistence of hallucinations in AI models raises concerns about the reliability of AI-generated content, especially given the significant investments in these technologies. The industry is criticized for promoting expensive systems that are supposed to be nearing human-level intelligence but still struggle with basic factual accuracy. As OpenAI's models reach a performance plateau, the company is under pressure to achieve a genuine breakthrough to maintain its initial momentum and credibility.

Key takeaways:

OpenAI's new model GPT-4.5 hallucinates 37% of the time according to their SimpleQA benchmarking tool.
GPT-4o and o3-mini models have even higher hallucination rates, at 61.8% and 80.3% respectively.
AI hallucination is a widespread issue, with even the best models generating hallucination-free text only about 35% of the time.
The AI industry faces criticism for selling expensive systems that struggle with factual accuracy, highlighting the need for significant breakthroughs.

OpenAI Admits That Its New Model Still Hallucinates More Than a Third of the Time

Key takeaways:

Comments (0)

Newsletter