Grading My 17 AI Predictions

I made a bunch of predictions in January. How am I doing?

Jun 26, 2025

My most popular post so far was my predictions I published back in January. We’re halfway through 2025, so it’s time to revisit them and see which ones have already come true, which ones are on track, and which ones seem a bit less likely. Overall, I’m pretty happy with how I did, but you can decide for yourself what you think of my prognostication skills.1

1. No GPT‑5, but smarter thinkers emerge

OpenAI hasn’t released GPT‑5 yet, and new models that were trained on bigger datasets (e.g., GPT-4.5) have been unimpressive. Status: Correct so far, looks likely to happen
As I predicted, thinking models like o3 have pushed the bounds of what AI can do.2 New thinking models have appeared - o3-pro, o4‑mini, Anthropic’s chain‑of‑thought Opus upgrade, Deepseek, Manus – that can solve increasingly complex problems. Status: Happened
My stretch-goal is to have GenAI solve an unsolved problem in math or physics. This hasn’t happened yet, and I’m not sure current models can deliver this. I’d probably give less than 50% odds that this happens this year, but it’s still possible. Status: Still possible

2. Agents take center stage

I predicted that agents would be able to do a lot more this year: handle complex tasks on the web, make phone calls, manage calendars, etc. We’re definitely closer. OpenAI launched Operator two weeks after my prediction post, and MCP (model context protocol) has become increasingly popular. These tools allow bots to interact with websites and apps. They don’t work perfectly yet and can’t make phone calls, but I think we’re on track for this to come true. Status: Likely to happen

3. Self‑driving goes mainstream

As readers know, I’m very excited about self-driving cars. Here’s where we are:

Waymo is doing roughly 250K paid rides per week and has expanded into Los Angeles and Austin, with Miami and D.C. coming soon. I predicted we’d get to at least 200K. Status: Happened
There hasn’t yet been a TV episode centered on a robotaxi yet, but it is becoming more a part of pop culture with mentions on late night shows. Still, waiting on the mainstream TV episode, but I feel good that it’s coming. Status: Likely to happen
On the automated trucking front, there’s real progress here. Aurora’s driverless trucks are hauling freight between Dallas and Houston. There are 10 trucks, and they covered 1200 miles in the first few weeks. I predicted at least one company would get past pilot, and I think this still counts as a late-stage pilot (although ChatGPT was willing to count this prediction as “happened”). Status: Very likely to happen

4. AIO (AI‑optimization) Services

The fact that users are starting to search on Perplexity and ChatGPT or use Google’s AI-generated answers is a big deal in the ad industry. Many companies I work with have seen notable declines in search traffic due to AI.

I predicted that big agencies would announce AIO offerings, and at Cannes Lions, Publicis, Omnicom, WPP, Adobe, etc. all announced these. Status: Happened
Perplexity is getting ready to offer ads at scale. They have limited advertising on their free tier already. So, ads are coming. Also, I had a weird experience with ChatGPT where it served me unrelated product recommendations a couple of times. So, I think they are testing something around product research, but I don’t know if it will be sponsored. Status: Very likely to happen

5. Token Economics

I predicted that token costs for thinking models would fall by at least 10%. In June, OpenAI cut price for o3 inference costs by 80%. Status: Happened, but by way more than I predicted

6. First All‑AI TV Commercial

Coign used Google Veo to generate a credit card spot that hit national airwaves during the NBA Finals.3 My call that fully AI-generated TV episodes would not be feasible still holds true. Status: Happened but need to wait until year end for TV prediction.

7. Call‑Center Deflection

I’m still seeing rapid adoption of automated solutions. Here’s an example of Verizon deflecting 20M calls a month with AI. I couldn’t find data to say for sure that call volumes are down 5-10%. I’m not certain we’ll get to that level – it still feels like a stretch. Status: Trending the right way, unclear if it will happen

8. Sales‑Enablement Emails

I believe there’s a lot of value in AI for sales enablement. As this article points out, implementing high quality tools in Agent Force (Salesforce’s Agent platform) seems to be hard, and few companies are really doing it. At Bain, we rolled out a tool where when an employee is reading an article on Bain.com, there’s a button that automatically generates an email to share the article with someone. But, I have not yet received an email that I thought came from AI, and my prediction was to get 10+ this year. Of course, the AI could be so good that I’m not able to detect it… Status: Still possible

9. Warehouse Robots Rule, Humanoids Remain on YouTube

My prediction – that GenAI powered warehouse robots would be touted by at least three firms - has happened. Amazon, Walmart, and UPS each published case studies touting major productivity gains from GenAI warehouse bots. The Amazon and Walmart investments go beyond piloting to scale deployments. Status: Happened
Meanwhile, Figure is starting to get some pilots, but it still seems like very early days. So far, my January take that humanoid robots won’t be deployed this year seems correct. Status: Likely to happen

10. RAG Fades as “Unlimited Memory” Rises

RAG (Retrieval Augmented Generation) is a common technique allowing GenAI models to search across large datasets.4 One of the most exciting developments in the last few months is OpenAI’s new connectors. These allow users to run Deep Research and other queries on their own data. It’s a big improvement, and I’ve found the results to be very good. That said, it’s just using an advanced form of RAG that uses caching and possibly some agentic “thinking” to make the process more efficient. I’m not seeing much evidence that we are moving away from RAG, just that RAG 2.0 will be smoother than RAG 1.0. Status: Probably won’t happen

11. No Major U.S. AI Regulation

As you are probably aware, the Senate is currently debating a reconciliation bill called the One Big Beautiful Bill Act (OBBBA). One of the provisions in the House version is a 10-year ban on states regulating AI.5 The Senate version doesn’t have an outright ban but instead ties regulating AI to federal funding for broadband programs, which is effectively a moratorium like the House’s bill, but is compliant with the reconciliation rules. Is a regulation saying that there can’t be regulation itself a form of regulation? I’m going to say that this is not actually regulation and say that this points to a low likelihood of regulation in the future. Status: Very likely to happen

If you are keeping score, that means out of 17 predictions:

6 have already happened
3 look very likely
4 look likely
3 are still possible
1 probably won’t happen

Looking ahead

I’m sure there will be lots of interesting developments over the rest of the year. I’ll be particularly watching carefully to see if

Call center and sales team adoption continues to accelerate
AI is able to make a leap forward in science

As well as whether the other trends mentioned above continue apace. If you have your own predictions or questions about mine, leave a comment or drop me a note.

I’ve gotten a little recent criticism that my predictions weren’t bold enough since a bunch have already happened. Next time, I’ll try to rate these based on likelihood of happening.

To create a first draft of this article I gave my original piece to o3-pro and asked it to go through and score my predictions. It took 10 minutes and surfaced a few things like the AI-commercial that I had missed. That was not possible in January.

When I first saw this, I honestly thought it was a parody, but it seems to be genuine.

Here’s a refresher on how this works. First, the system “vectorizes” the data. That means that it breaks the data into pieces and then maps them across a lot of dimensions. Those dimensions could be things like the industry, the company being discussed, the type of document, the sub-sector of the company, etc. Then, when someone does a search, a retriever grabs the most relevant passages from the vector database and feeds them to the model, which then writes an answer using that information. The result is citation-friendly responses with far fewer hallucinations. The downside is that it might not pull in ALL the relevant information. Also, GenAI is being used on the data that gets put into the model, but the vector matching is not exactly GenAI.

I think this is probably a good idea. Having different regulations in different states would be very hard for companies to work with and reduce innovation. Of course, I’d love to see this moratorium coupled with some commonsense federal rules, which I don’t think will happen.

Jarred Devar

Thanks for another great article Richard…impressive hit rate on your forecasts.

One potential watch out on Call Centre (a space I am particularly interested in as it seems to be a great real world playground) - I think the Verizon example refers to 20m interactions vs 20m calls. An interaction could be any point of contact (incl. digital/ SMS) and would not necessarily have originated as a call. Also assume they will have a mixture of a rule based system + GenAI with a human escalation point - so some of the interactions could end in calls as well. Overall though will definitely have lowered call volumes but harder to say by what. I think the Klarna example points to the hybrid world for the short term (AI + human escalation point).

On the agentic front - I remain less optimistic but hope I am proven wrong. At least the Operator product has made its way to Europe.

Expand full comment