OpenAI’s New o3 and o4-mini AI Models Hallucinate Even More, Tests Reveal

OpenAI’s latest reasoning AI models, o3 and o4-mini, are making headlines for the wrong reasons they hallucinate more than their predecessors. Despite improved performance in coding and math, internal OpenAI benchmarks show o3 hallucinated in 33% of factual queries about people, while o4-mini reached a staggering 48%. That’s double the rate of older models like o1 and o3-mini.

OpenAI’s

Worse yet, OpenAI admits it doesn’t fully understand why. Their report states “more research is needed” to explain why scaling up reasoning models has worsened hallucination rates. Independent testers at Transluce confirmed o3 invents actions it can’t perform, like claiming to run external code. Experts warn this poses risks for use cases where accuracy is non-negotiable like law, medicine, and enterprise software.

OpenAI’s

As AI models pivot toward reasoning over brute-force data scaling, the balance between creativity and reliability is proving fragile. OpenAI insists improvements are ongoing, but for now, hallucinations remain an unresolved AI flaw.

Please follow and like us:
Abishek D Praphullalumar
Love to know your thoughts on this:

      Leave a reply


      PixelHowl HQ
      Your ultimate playground for DevOps adventures, thrilling tech news, and super-fun tutorials. Let's build the future, together!
      Chat with us!
      connect@pixelhowl.com
      Feel free to discuss your ideas with us!
      © 2025 PixelHowl. All rights reserved. Made with ♥ by tech enthusiasts.
      PixelHowl
      Logo