
In recent evaluations, OpenAI uncovered a troubling trend: its latest large language models (LLMs), GPT-3 and GPT-4, are exhibiting significantly higher rates of hallucinations, or incorrect outputs, when compared to the earlier model GPT-1.
- According to reports, OpenAI’s GPT-3 hallucinated during 33% of tests concerning public figures, a stark increase over the previous iteration, which had a lower rate.
- Furthermore, GPT-4-mini demonstrated an alarming 48% hallucination rate in similar circumstances.
A comprehensive analysis indicates that while OpenAI continues to develop these advanced models, there is an urgent need for further research to address their propensity for hallucinating. Industry analysts argue that the introduction of reasoning capabilities in AI might be contributing to the increased error rates. This raises questions about reliability, as reliance on such models necessitates careful verification of their outputs.