OpenAI’s New o3 and o4-mini AI Models Hallucinate Even More, Tests Reveal

Abishek D Praphullalumar 9 0

OpenAI’s latest reasoning AI models, o3 and o4-mini, are making headlines for the wrong reasons they hallucinate more than their predecessors. Despite improved performance in coding and math, internal OpenAI benchmarks show o3 hallucinated in 33% of factual queries about people, while o4-mini reached a staggering 48%. That’s double the rate of older models like o1 and o3-mini.

OpenAI’s

Worse yet, OpenAI admits it doesn’t fully understand why. Their report states “more research is needed” to explain why scaling up reasoning models has worsened hallucination rates. Independent testers at Transluce confirmed o3 invents actions it can’t perform, like claiming to run external code. Experts warn this poses risks for use cases where accuracy is non-negotiable like law, medicine, and enterprise software.

OpenAI’s

As AI models pivot toward reasoning over brute-force data scaling, the balance between creativity and reliability is proving fragile. OpenAI insists improvements are ongoing, but for now, hallucinations remain an unresolved AI flaw.

Please follow and like us:

OpenAI’s New o3 and o4-mini AI Models Hallucinate Even More, Tests Reveal

Previous

AI Hallucinations Are Getting Worse: OpenAI’s New Reasoning Models Raise Fresh Alarm

OpenAI’s New o3 and o4-mini AI Models Hallucinate Even More, Tests Reveal

Next

Samsung Galaxy S25 FE Might Use the Same Chip as Its Predecessor

Tags: #AI #AIHallucinations #ArtificialIntelligence #ChatGPT #GPT4 #o3 #o4mini #OpenAI #ReasoningModels #TechNews

Abishek D Praphullalumar

Related Articles

Added to wishlistRemoved from wishlist 0

WhatsApp’s New “Quick Recap” AI Feature Raises More Than Eyebrows

WhatsApp’s New “Quick Recap” AI Feature Raises More Than Eyebrows

Added to wishlistRemoved from wishlist 0

Grok 4’s AI Answers May Be Following Elon Musk’s Playbook

Grok 4’s AI Answers May Be Following Elon Musk’s Playbook

Added to wishlistRemoved from wishlist 0

ChatGPT Is Testing a ‘Study Together’ Mode And Honestly, It Could Be a Game Changer for Ethical AI in Education

ChatGPT Is Testing a ‘Study Together’ Mode And Honestly, It Could Be a Game Changer for Ethical AI in Education

Added to wishlistRemoved from wishlist 0

Google’s AI Overviews Face EU Antitrust Heat And the AI Act’s Shadow Looms

Google’s AI Overviews Face EU Antitrust Heat And the AI Act’s Shadow Looms

Love to know your thoughts on this:

Leave a reply Cancel reply

PixelHowl HQ

Your ultimate playground for DevOps adventures, thrilling tech news, and super-fun tutorials. Let's build the future, together!

Chat with us!

connect@pixelhowl.com

Feel free to discuss your ideas with us!

/PixelHowls /pixelhowl /pixelhowl Want our help with your project? Get a quote here!

© 2025 PixelHowl. All rights reserved. Made with ♥ by tech enthusiasts.

About Us Howl Academy DevOps Privacy Policy Howl Hacks