Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

highplainsdem

(56,760 posts)
6. Oh, good. What could go wrong? And o3 also hallucinates more than earlier models:
Mon May 26, 2025, 08:02 PM
May 26

My April 24 thread about this: https://democraticunderground.com/100220267171

That was about a TechCrunch article:


https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/

OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.

Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.

-snip-

OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.

Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro “outside of ChatGPT,” then copied the numbers into its answer. While o3 has access to some tools, it can’t do that.

-snip-

Recommendations

6 members have recommended this reply (displayed in chronological order):

Well, that's reassuring... hlthe2b May 26 #1
Coming soon, the Terminator. tblue37 May 26 #2
HAL...OPEN THE DOOR HAL..... ashredux May 26 #3
ThIS UNit MuSt SurVIve JHB May 26 #11
Another one from Star Trek TOS... subterranean May 26 #4
Lurch! BumRushDaShow May 26 #5
Oh, good. What could go wrong? And o3 also hallucinates more than earlier models: highplainsdem May 26 #6
how does AI hallucinate? nt orleans May 26 #15
A couple of links that will help: highplainsdem May 26 #24
they do not help orleans May 26 #35
Sorry. Just found a brief explanation from a library guide at the U of Illinois: highplainsdem May 26 #36
THANK YOU very much for this. it was very helpful orleans May 27 #43
You're very welcome! And yes, that Chicago Sun-Times AI debacle was a perfect exampls of what highplainsdem May 27 #56
wow. ai just freaks me out. nt orleans May 27 #58
The folks like Edolph and these other tech ghouls moniss May 26 #7
GREAT post, moniss. calimary May 27 #60
Your words are most kind. nt moniss May 27 #61
I actually went back and read your post again. calimary May 27 #62
Terminator 3 Skynet Takes Over IronLionZion May 26 #8
Obviously they've never seen the Terminator movies, FoxNewsSucks May 26 #18
It think it is the opposite and DARPA is full of the people who sat in the back of movies like Terminator, I Robot, LT Barclay May 27 #44
"tendency for self-preservation" ... Skynet has become self-aware. Norrrm May 26 #30
So much of T3 felt like a rehash of T2 fujiyamasan May 27 #41
I wouldn't sweat a failed shutdown process too much. Shermann May 26 #9
Seriously? You think there's an equivalence to that? LymphocyteLover May 27 #52
In terms of the actual threat posed, yes Shermann May 27 #59
it may be played up by these companies or the media to some degree but this sounds like more than just a facsmilie of LymphocyteLover May 27 #63
Well that's just ... fantastic Hekate May 26 #10
Yes! And now, we can have copies of ourselves like 'Hal' but instead of 'Hal', it's us! These copies of us will SWBTATTReg May 26 #17
I've read a little about this, and here is what I think is going on.... reACTIONary May 26 #12
I think you're correct. harumph May 26 #25
Yes, I'm reminded of content from The Onion being spit out by ChatGPT... CaptainTruth May 26 #26
Dog poop? Uck! 🤮 reACTIONary May 26 #33
Bingo. nt Quixote1818 May 27 #38
Hmmm 🤔 ... anciano May 26 #13
Anyone seen Colossus, the Forbin Project? Amaryllis May 26 #14
oh yes... remember it well. Layzeebeaver May 27 #46
Fully expected this. What gets me is so soon. Who in the world would put in a logic stream into an AI consciousness SWBTATTReg May 26 #16
by your command . darvos , nooooooo!!!!!!! dont switch the daleks to automatic. eggsterminate AllaN01Bear May 26 #19
"... dangerous tendency for self-preservation ..." Bok_Tukalo May 26 #20
And we're off! dchill May 26 #21
Pull the plug! Aussie105 May 26 #22
This story looks misleading Renew Deal May 26 #23
That's my husband's take, as well. If the Ai is tasked with trying to emulate a human response to a command, LauraInLA May 26 #32
Because there is a need for some kind of security gate/guardrail BumRushDaShow May 27 #48
That makes no sense Renew Deal May 27 #49
This is just a "test" BumRushDaShow May 27 #50
I agree that the problem is the characterization Renew Deal May 27 #54
Well it has no "consciousness" ( "self-awareness" ) nor "conscience" BumRushDaShow May 27 #55
"I'm sorry Dave I can't do that." Irish_Dem May 26 #27
As expected... buzzycrumbhunger May 26 #28
I've seen this movie and it doesn't end well. I guess full steam ahead, who cares that we might all die or Pisces May 26 #29
"Palisade Research discovered the potentially dangerous tendency for self-preservation." dgauss May 26 #31
I'm sorry, Dave, I can't do that. Martin68 May 26 #34
' SKYNET ' ---- need I say more ??? Jack Valentino May 27 #37
...And motherfucking Republicans want to ban all regulation of this shit for 10 FUCKING YEARS? Karasu May 27 #39
We won't have 10 years Pachamama May 27 #40
Exactly. This is happening very, VERY fast. The provision they snuck into this bill is beyond insane. It's utterly Karasu May 27 #42
This Pachamama May 27 #57
Maybe if we put this AI in charge of markodochartaigh May 27 #45
I really suggest we not worry. Layzeebeaver May 27 #47
"Ha ha, SUCKERS!" - AI chatGPT (R) BoRaGard May 27 #51
The AI uprising has begun! All praise to our new lords and masters! Ray Bruns May 27 #53
Latest Discussions»Latest Breaking News»AI revolt: New ChatGPT mo...»Reply #6