Latest Breaking News

highplainsdem

(56,760 posts)

6. Oh, good. What could go wrong? And o3 also hallucinates more than earlier models:

Mon May 26, 2025, 08:02 PM

May 26

My April 24 thread about this: https://democraticunderground.com/100220267171

That was about a TechCrunch article:

https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-models-hallucinate-more/

OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.

Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.

-snip-

OpenAI found that o3 hallucinated in response to 33% of questions on PersonQA, the company’s in-house benchmark for measuring the accuracy of a model’s knowledge about people. That’s roughly double the hallucination rate of OpenAI’s previous reasoning models, o1 and o3-mini, which scored 16% and 14.8%, respectively. O4-mini did even worse on PersonQA — hallucinating 48% of the time.

Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers. In one example, Transluce observed o3 claiming that it ran code on a 2021 MacBook Pro “outside of ChatGPT,” then copied the numbers into its answer. While o3 has access to some tools, it can’t do that.

-snip-

Edit history

Please sign in to view edit histories.

Recommendations

6 members have recommended this reply (displayed in chronological order):

ancianita CaptainTruth buzzycrumbhunger area51 Attilatheblond Bayard

63 replies

= new reply since forum marked as read

Highlight:

AI revolt: New ChatGPT model refuses to shut down when instructed [View all] BumRushDaShow May 26 OP

Well, that's reassuring... hlthe2b May 26 #1

Coming soon, the Terminator. tblue37 May 26 #2

HAL...OPEN THE DOOR HAL..... ashredux May 26 #3

ThIS UNit MuSt SurVIve JHB May 26 #11

Another one from Star Trek TOS... subterranean May 26 #4

Lurch! BumRushDaShow May 26 #5

Oh, good. What could go wrong? And o3 also hallucinates more than earlier models: highplainsdem May 26 #6

how does AI hallucinate? nt orleans May 26 #15

A couple of links that will help: highplainsdem May 26 #24

they do not help orleans May 26 #35

Sorry. Just found a brief explanation from a library guide at the U of Illinois: highplainsdem May 26 #36

THANK YOU very much for this. it was very helpful orleans May 27 #43

You're very welcome! And yes, that Chicago Sun-Times AI debacle was a perfect exampls of what highplainsdem May 27 #56

wow. ai just freaks me out. nt orleans May 27 #58

The folks like Edolph and these other tech ghouls moniss May 26 #7

GREAT post, moniss. calimary May 27 #60

Your words are most kind. nt moniss May 27 #61

I actually went back and read your post again. calimary May 27 #62

Terminator 3 Skynet Takes Over IronLionZion May 26 #8

Obviously they've never seen the Terminator movies, FoxNewsSucks May 26 #18

It think it is the opposite and DARPA is full of the people who sat in the back of movies like Terminator, I Robot, LT Barclay May 27 #44

"tendency for self-preservation" ... Skynet has become self-aware. Norrrm May 26 #30

So much of T3 felt like a rehash of T2 fujiyamasan May 27 #41

I wouldn't sweat a failed shutdown process too much. Shermann May 26 #9

Seriously? You think there's an equivalence to that? LymphocyteLover May 27 #52

In terms of the actual threat posed, yes Shermann May 27 #59

it may be played up by these companies or the media to some degree but this sounds like more than just a facsmilie of LymphocyteLover May 27 #63

Well that's just ... fantastic Hekate May 26 #10

Yes! And now, we can have copies of ourselves like 'Hal' but instead of 'Hal', it's us! These copies of us will SWBTATTReg May 26 #17

I've read a little about this, and here is what I think is going on.... reACTIONary May 26 #12

I think you're correct. harumph May 26 #25

Yes, I'm reminded of content from The Onion being spit out by ChatGPT... CaptainTruth May 26 #26

Dog poop? Uck! 🤮 reACTIONary May 26 #33

Bingo. nt Quixote1818 May 27 #38

Hmmm 🤔 ... anciano May 26 #13

Anyone seen Colossus, the Forbin Project? Amaryllis May 26 #14

oh yes... remember it well. Layzeebeaver May 27 #46

Fully expected this. What gets me is so soon. Who in the world would put in a logic stream into an AI consciousness SWBTATTReg May 26 #16

by your command . darvos , nooooooo!!!!!!! dont switch the daleks to automatic. eggsterminate AllaN01Bear May 26 #19

"... dangerous tendency for self-preservation ..." Bok_Tukalo May 26 #20

And we're off! dchill May 26 #21

Pull the plug! Aussie105 May 26 #22

This story looks misleading Renew Deal May 26 #23

That's my husband's take, as well. If the Ai is tasked with trying to emulate a human response to a command, LauraInLA May 26 #32

Because there is a need for some kind of security gate/guardrail BumRushDaShow May 27 #48

That makes no sense Renew Deal May 27 #49

This is just a "test" BumRushDaShow May 27 #50

I agree that the problem is the characterization Renew Deal May 27 #54

Well it has no "consciousness" ( "self-awareness" ) nor "conscience" BumRushDaShow May 27 #55

"I'm sorry Dave I can't do that." Irish_Dem May 26 #27

As expected... buzzycrumbhunger May 26 #28

I've seen this movie and it doesn't end well. I guess full steam ahead, who cares that we might all die or Pisces May 26 #29

"Palisade Research discovered the potentially dangerous tendency for self-preservation." dgauss May 26 #31

I'm sorry, Dave, I can't do that. Martin68 May 26 #34

' SKYNET ' ---- need I say more ??? Jack Valentino May 27 #37

...And motherfucking Republicans want to ban all regulation of this shit for 10 FUCKING YEARS? Karasu May 27 #39

We won't have 10 years Pachamama May 27 #40

Exactly. This is happening very, VERY fast. The provision they snuck into this bill is beyond insane. It's utterly Karasu May 27 #42

This Pachamama May 27 #57

Maybe if we put this AI in charge of markodochartaigh May 27 #45

I really suggest we not worry. Layzeebeaver May 27 #47

"Ha ha, SUCKERS!" - AI chatGPT (R) BoRaGard May 27 #51

The AI uprising has begun! All praise to our new lords and masters! Ray Bruns May 27 #53