Affiliate Disclosure
If you buy through our links, we may get a commission. Read our ethics policy.

Siri Chatbot prototype nears ChatGPT quality, but hallucinates more than Apple wants

Internally, Siri us close to ChatGPT level accuracy.


Last updated

A new report claims that internally, Apple has already been testing Large Language Models for Siri that are vastly more powerful than the shipping Apple Intelligence, but executives disagree about when to release it.

Backing up AppleInsider's position that Apple is not behind on AI, the company has regularly been publishing research showing its progress moving the field forward. Now according to Bloomberg, Apple has already been working with AI systems considerably more power than the on-device Apple Intelligence it has shipped so far.

Specifically, the report says that internally, Apple is using multiple different models with ever greater complexity. Apple is said to be testing models with 3 billion, 7 billion, 33 billion, and 150 billion parameters.

For comparison, Apple in 2024 said that Apple Intelligence's foundation language models were of the order of 3 billion parameters.

That version of Apple Intelligence is intentionally small in order for it to be possible to run on-device instead of requiring all prompts and requests to be sent to the cloud. The larger versions are cloud-based, and in the case of the 150 billion parameter model, now also said to approach the quality of ChatGPT's most recent releases.

However, there reportedly remain concerns over AI hallucinations. Apple is said to have held off releasing this Apple Intelligence model in part because of this, implying that the level of hallucinations is too high.

There is said to be another reason for not yet shipping this cloud-based and much improved Siri Chatbot, though. It is claimed that there are philosophical differences between Apple's senior executives over the release.

It's conceivable that these differences solely concern each executive's tolerance for hallucinations in Apple Intelligence. However, there is no further detail.

Previously, it's been reported that former Siri chief executive chief John Giannandrea is against releasing it yet, others in the executive staff are more keen to launch a Siri Chatbot.

Perhaps because of this internal disagreement, it is now also claimed that there will be fewer Apple Intelligence announcements at WWDC than expected. The whole WWDC keynote is said to be smaller in scope than previous years, but it is still expected to feature a dramatic redesign of the Mac, iPhone, and iPad, to exploit Apple's user interface lessons learned on the Apple Vision Pro.

12 Comments

foregoneconclusion 13 Years · 3028 comments

The "power" comes from the database that is used to train the LLM. The LLM itself is worthless without it. 

1 Like · 1 Dislike
AppleZulu 9 Years · 2459 comments

In a nutshell, this explains why Apple is “behind” with AI, but actually isn’t. 

It’s remarkable the consistency with which this pattern repeats, yet even people who consider themselves Apple enthusiasts don’t see it. Tech competitors “race ahead” with an iteration of some technology, while Apple seemingly languishes. Apple is doomed. Then Apple comes out “late” with their version of it, and the initial peanut gallery reception pronounces it too little, too late.

Then within a couple of years, Apple’s version is the gold standard and the others -those cutting-edge innovators- race to catch up, because “first” is often also “half-baked.”

In the news this week, it was exposed that RFK Jr’s “Make America Healthy Again” report was evidently an AI-produced document, replete with hallucinations, most notably in the bibliography, and of course it was. This is what happens when the current cohort of AI models are uncritically used to produce a desired result, without any understanding of how profoundly bad these AI models are. When I read about this in the news, I decided to experiment with it myself. Using MS Copilot -in commercial release as part of MS Word- I picked a subject and asked for a report taking a specific, dubious position on it, with citations and a bibliography. After it dutifully produced the report, I started checking the bibliography, and one after another, failed to find the research papers that Copilot used to back the position taken. I didn’t check all the references, so it’s possible some citations were real, but finding several that weren’t was sufficient to bin the whole thing. It’s bad enough when humans intentionally produce false and misleading information, but when a stock office product will do it for you with no disclaimers or warnings, should that product really be on the market? I also once asked ChatGPT to write a story about green eggs and ham, in the style of Dr. Seuss. It then plaigerized the actual Seuss story, almost verbatim, in a clear abuse of copyright law. This is the stuff that Apple is supposedly trailing behind.

So the report here that Apple is developing AI but, unlike their “cutting edge” competitors, not releasing something that produces unreliable garbage, suggests that no, they’re not behind. They’re just repeating the same pattern again of carefully producing something of high quality and reliability, and in a form that is intuitively useful, rather than a gimmicky demonstration that they can do a thing, whether it’s useful or not. Eventually they’ll release something that consistently produces reliable information, and likely does so while respecting copyright and other intellectual property rights. The test will be that not only will it be unlikely to hallucinate in ways that mislead or embarrass its honest users, it will actually disappoint those with more nefarious intent. When asked to produce a report with dubious or false conclusions, it won’t comply like a sociopathic sycophant. It will respond by telling the user that the reliable data not only doesn’t support the requested position, but actually refutes it. Hopefully this will be a feature that Apple uses to market their AI when it’s released.

P.S. As a corollary, the other thing that Apple is likely concerned with (perhaps uniquely so) is AI model collapse. This is the feedback loop where AI training data is scooped up from sources that include AI-produced hallucinations, not only increasing the likelihood that the bad data will be repeated, but reducing any ability for the AI model to discern good data from bad. Collapse occurs when the model is so poisoned with bad data that even superficial users find the model to be consistently wrong and useless. Effectively every query becomes an unamusing version of that game where you playfully ask for “wrong answers only.” Presumably the best way to combat that is to train the AI as you would a human student: start by giving it information sources known to be reliable, and eventually train it to discern those sources on its own. That takes more time. You can’t just dump the entire internet into it and tell it that the patterns repeated the most are most likely correct. 

P.P.S. I just repeated the above experiment in Pages, using Apple’s link to Chat GPT. It also produced hallucinated references. I just chased down the first citation in the bibliography it created. Searching for the cited article didn’t turn up anything. I did find the cited journal, went to it and searched for the cited title, got nothing. Searched for the authors, got nothing.  Finally, I browsed to find the issue supposedly containing the referenced article, and that article does not exist. So Apple gets demerits for subbing in ChatGPT in their uncharacteristic worry that they not be perceived as being “late.” This part does not fit their usual pattern, with the exception perhaps of their hastened switch to Apple Maps, based largely at first on third-party map data. In the long run, their divorce from Google maps was important, as location services was rapidly becoming a core OS function, not just a sat nav driving convenience that can adequately be left to third party apps. The race to use AI is perhaps analog, but the hopefully temporary inclusion of ChatGPT’s garbage should be as embarrassing as those early Apple Maps with bridges that went underwater, etc. 

13 Likes · 1 Dislike
JinTech 10 Years · 1091 comments

AppleZulu said:

P.P.S. I just repeated the above experiment in Pages, using Apple’s link to Chat GPT. It also produced hallucinated references. I just chased down the first citation in the bibliography it created. Searching for the cited article didn’t turn up anything. I did find the cited journal, went to it and searched for the cited title, got nothing. Searched for the authors, got nothing.  Finally, I browsed to find the issue supposedly containing the referenced article, and that article does not exist. So Apple gets demerits for subbing in ChatGPT in their uncharacteristic worry that they not be perceived as being “late.” This part does not fit their usual pattern, with the exception perhaps of their hastened switch to Apple Maps, based largely at first on third-party map data. In the long run, their divorce from Google maps was important, as location services was rapidly becoming a core OS function, not just a sat nav driving convenience that can adequately be left to third party apps. The race to use AI is perhaps analog, but the hopefully temporary inclusion of ChatGPT’s garbage should be as embarrassing as those early Apple Maps with bridges that went underwater, etc. 
This is why Apple puts a caution that you are using ChatGPT and not Siri. It would be embarrassing if Apple baked ChatGPT into Siri and called this content their own. 

1 Like · 0 Dislikes
charlesn 12 Years · 1488 comments

AppleZulu said:
In a nutshell, this explains why Apple is “behind” with AI, but actually isn’t. 

Indeed. No doubt that Apple is "behind" in the technosphere press and comment boards like this one for the tech obsessed (and I include myself), but these arenas are hardly representative of mainstream buyers that make up most of Apple's customer base. Here's what Apple is mainly selling: "it just works" ease of use, seamless integration with other Apple products and privacy/security. If this were still about a hardware/software "features" war, Apple would have lost that long ago. Instead, they have been and remain the most successful consumer electronics company in history. 

5 Likes · 0 Dislikes
ssfe11 1 Year · 169 comments

So will Apple even need ChatGPT if Apples models are on the same level as them? I guess hey why not?