Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
Editorials & Other Articles
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
Latest Breaking News
In reply to the discussion: AI revolt: New ChatGPT model refuses to shut down when instructed [View all]highplainsdem
(56,760 posts)6. Oh, good. What could go wrong? And o3 also hallucinates more than earlier models:
My April 24 thread about this: https://democraticunderground.com/100220267171
That was about a TechCrunch article:
https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/
OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.
Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.
-snip-
OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.
Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro “outside of ChatGPT,” then copied the numbers into its answer. While o3 has access to some tools, it can’t do that.
-snip-
Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.
-snip-
OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.
Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro “outside of ChatGPT,” then copied the numbers into its answer. While o3 has access to some tools, it can’t do that.
-snip-
Edit history
Please sign in to view edit histories.
Recommendations
6 members have recommended this reply (displayed in chronological order):
63 replies
= new reply since forum marked as read
Highlight:
NoneDon't highlight anything
5 newestHighlight 5 most recent replies
RecommendedHighlight replies with 5 or more recommendations

AI revolt: New ChatGPT model refuses to shut down when instructed [View all]
BumRushDaShow
May 26
OP
Oh, good. What could go wrong? And o3 also hallucinates more than earlier models:
highplainsdem
May 26
#6
Sorry. Just found a brief explanation from a library guide at the U of Illinois:
highplainsdem
May 26
#36
You're very welcome! And yes, that Chicago Sun-Times AI debacle was a perfect exampls of what
highplainsdem
May 27
#56
It think it is the opposite and DARPA is full of the people who sat in the back of movies like Terminator, I Robot,
LT Barclay
May 27
#44
it may be played up by these companies or the media to some degree but this sounds like more than just a facsmilie of
LymphocyteLover
May 27
#63
Yes! And now, we can have copies of ourselves like 'Hal' but instead of 'Hal', it's us! These copies of us will
SWBTATTReg
May 26
#17
Fully expected this. What gets me is so soon. Who in the world would put in a logic stream into an AI consciousness
SWBTATTReg
May 26
#16
by your command . darvos , nooooooo!!!!!!! dont switch the daleks to automatic. eggsterminate
AllaN01Bear
May 26
#19
That's my husband's take, as well. If the Ai is tasked with trying to emulate a human response to a command,
LauraInLA
May 26
#32
I've seen this movie and it doesn't end well. I guess full steam ahead, who cares that we might all die or
Pisces
May 26
#29
"Palisade Research discovered the potentially dangerous tendency for self-preservation."
dgauss
May 26
#31
...And motherfucking Republicans want to ban all regulation of this shit for 10 FUCKING YEARS?
Karasu
May 27
#39