What's Apple Doing?

Why I still think Apple might do something interesting in AI

Jun 19, 2025

Another Bite at the Apple

Long‑time readers know I’ve been watching Apple’s Apple Intelligence1 story carefully since it was first teased a year ago at WWDC 24. The pitch was appealing: instead of making AI a separate thing, Apple would quietly weave GenAI into cool new features that magically work. One year later, the implementation of this strategy has been mixed. Genmoji launched, but never really went viral. It turns out that nobody wants to spend 5 minutes making the perfect emoji. While the much anticipated “smarter, more proactive” Siri is still not here. That meant high expectations for AI announcements at WWDC 25 a few weeks ago. If you’ve seen the headlines, you probably saw that many critics were disappointed, but I see some signs of hope. I think there are some important lessons in Apple’s approach that other companies and investors can learn from which I’ll get to.

At the same time as these announcements, Apple also took up headlines with a new paper called the “Illusion of Thinking” which says that LLMs aren’t as smart as we think. I’ll also take a bit of a digression and show why I don’t think the conclusions make sense.

Hard to See Past Liquid Glass

If you mostly skimmed the headlines during WWDC, you probably saw many like this - Apple’s Liquid Glass redesign doesn’t look like much - and figured Apple punted on AI this year. Liquid Glass is the new UI for OS 26.2 I downloaded the beta and have been using Liquid Glass for a couple of weeks. It looks pretty, but I’m not sure it’s a huge leap forward.3

However, Apple also announced some interesting AI features. IF they launch successfully (big IF at this point), then it will be very impressive. The one that most caught my attention was Live Translation. Apple claims that in iOS 26 I’ll be able to speak English while my friend talks in French, and each of us hears the other in our native tongue, near‑instantaneously on phone calls. That’s pretty close to a Star Trek-level Universal Translator, and if it works, it could change the way people work and live. FaceTime will have real-time captions, and Messages will have real-time translation. Those definitely seem easier.

They also advertised some other AI features:

Call Screening – your iPhone silently answers unknown numbers, gets the caller’s name/reason, and shows a transcript before you decide to pick up
Hold Assist – the phone stays on the customer‑service line so you don’t have to listen to hold music although as automated customer service agents become more popular, this will become less useful
Intelligent Order Tracking – Mail/Wallet now parse shipping emails and surface all your packages in one timeline

Except the Live Translation, these all seem eminently doable with today’s models and technology. They also fit with Apple’s thesis: make AI invisible, make the product better. We’ll see if they can deliver!

As an aside, I think this is an interesting model for other companies to consider. Many companies are just starting to launch AI features in their products this year. The big challenge will be getting customers to use them and get value from them in order to justify price increases and/or make the product stickier. Getting users to change behavior is hard, and companies will have more success adding superpowers to existing features vs. having a separate tab that launches the AI feature. Of course, Apple hasn’t really delivered on this yet - it’s hard to do.

Thinking Different

Oddly, the same week as these announcements, Apple’s ML research group dropped a paper ominously titled “The Illusion of Thinking.” In short, they argue that “thinking” LLMs aren’t actually thinking after all. They believe that the reason the bots do so well on hard math problems is that those problems are actually in their training data. So, they wanted to test them on pure logic puzzles and found that the models fail once a certain level of complexity is reached. Many of you sent me this paper and asked me what this means. I’ll explain the problems with the methodology in a second, but first a brief digression to explain what the bots were asked to do.

They tested the bots on three logic puzzles, but Towers of Hanoi is probably the most familiar because there’s a toy for toddlers based on it. You can see an image below: the Tower of Hanoi is a wooden puzzle with three pegs and n disks of decreasing size stacked on one peg. The goal: move the entire stack to another peg, moving one disk at a time and never placing a larger disk on top of a smaller one. The optimal solution takes

\(2^{n}-1\)

moves, so the text and thinking required to do the task grows exponentially.

Apple asked models to write out the full move‑by‑move solution as the disk count increased. They report the models fail at eight disks—the bots essentially give up.

However, the fine print complicates this picture:

No coding allowed. The very first thing the models said was, “I can write Python to generate the sequence.” Apple disallowed that even though generating code is usually a valid strategy for solving a math problem.
64K token limit. Thinking about every move for eight‑plus disks runs past 64,000 tokens, the limit Apple imposed. Models noticed they were going to blow past the budget and gave up.4

Those constraints explain the collapse better than “the model can’t reason.” A rebuttal paper from Anthropic supports these points and also points out that one of the other logic puzzles they provided (River Crossing with N>5) is actually unsolvable.

So, while I don’t believe the “thinking models” are really thinking the way a human would, they can solve lots of complicated problems, including ones that are tough for humans.5

Back to Apple

Setting aside that paper, Apple is still betting that users don’t care how products work, only that the products continue to bring new conveniences. Call Screening, Order Tracking, and especially Live Translation are great litmus tests. If they land smoothly this fall, the griping about Apple will be forgotten. If they miss and either don’t deliver these features at all or the features don’t work, skepticism will grow, and shareholders may start to get anxious.

Either way, I’ve met with a lot of software companies that are in a similar situation to Apple where they have good ideas for GenAI features but are struggling to roll them out. Getting GenAI truly integrated into the product and shipping it is hard. But, just as with Apple, investors are watching and running out of patience, so let’s hope many of these features start to appear soon in Apple and elsewhere.

Long-time readers will also remember how annoying this name is. Obviously, we can’t abbreviate Apple Intelligence “AI.” So, we have to write out the whole thing every time.

Apple decided to renumber all their operating systems across devices, so they have the same number (26) which corresponds to the model year. For those younger readers, Microsoft used to do this with Windows ’95, for example.

I really don’t care about aesthetics for my icons. When I was growing up, I spent hundreds of hours playing NES games where, for example, a few colored squares glued together were supposed to be an Italian plumber rescuing a princess. I can suspend disbelief pretty easily. I’m sure people spent thousands of hours redesigning the icons for Camera, Podcasts, etc., but it really makes no difference to me as long as I know what to click.

Imagine you were told to solve a math problem and show all work, but you’re only given 10 sheets of paper. If you get to a point where you are 10% of the way through the problem, and you only have 2 sheets left, you might just stop and say it’s not going to work, which is reasonable.

There’s an interesting philosophical question about the nature of thinking here. Society is constantly moving the goalposts on what it means for an AI to be intelligent. For example, GenAI passed the Turing Test last year, but nobody thinks it’s really sentient yet. If we ever do create truly intelligent artificial life, we probably won’t realize it until long afterwards.