As promised, here is the top half of my list on the top 5 AI stories this year. If you missed it, part 1 is here. I’ve learned this week that there is some danger in doing a year in review article before the year is over. As you may have seen, OpenAI announced their “Twelve Days of Shipmas” last week, and we’re still in the middle of it. I’ve attempted to capture the latest announcements in here, but it’s possible that my #1 point below will be out of date before the year is over.
My first post in January will be predictions for 2025, and at that point, I’ll be sure to write about the implications of any new announcements at the end of this year. As you’ll see, many of these relate to the new developments in the foundational models which is, of course, the source of much of the other innovation.
If you think I missed a key story or plotline this year, please email me or post in the comments.
5. The Rise of GPT Templates
Enterprise LLM solutions are becoming more and more popular among investors and other companies. Just giving employees a secure environment where they can use tools like ChatGPT without fear that confidential information will leak out in to the world is a huge win for many companies.
That said, writing good prompts is hard. Even changing a couple of words, can produce a different outcome. Obviously, one solution is to train employees on how to write good prompts. But another new possibility that has emerged this year is “CustomGPTs.” These are essentially pre-written prompts that employees can use rather than having to start from a blank page. For example, a fund might have a custom GPT for analyzing a new management presentation or for coming up with questions to ask the management team. More sophisticated CustomGPTs involving APIs that call various internal external services allow for even more complex use cases. For example, a fund could build one that connects to their internal database of company data to allow users to query it in plain English.
These tools enable companies to standardize how employees use this technology and doesn’t require everyone to be a prompt engineer.
4. Klarna’s Announcement About Replacing 700 Customer Service Agents with AI
This may be the news event that I heard referenced the most in 2024. Klarna announced that they automated a big piece of their customer service and were able to divert two thirds of inquiries to AI, reducing average conversation times from 11 minutes to 2. Further announcements from them have been even bigger (AI replacing 1500 jobs), but it’s important to note that they aren’t laying anyone off but rather have curtailed future hiring.1
Still, this is one of the first big announcements from a company that used GenAI to have a significant bottom line impact. I’ve spoken to many investors and C-level execs this year who saw that example and realized that they needed to get serious about GenAI and figure out how to implement it rather than dabbling and experimenting.
3. The Era of Multimodal Models
In 2023, one of my favorite talking points was that in order to enable high value use cases, you needed to combine multiple modalities (e.g., text, video, images, code, etc.). Back in 2023, this was a tall order as it meant passing requests back and forth across different models possibly even created by different LLM companies.
On a project in 2023, we built a tool that could take pictures of food and predict the number of calories. At the time, it was a huge deal because it required stringing together an image model to figure out what food it is and then pass that to a text LLM to lookup the calorie count.
Well, not anymore. Now, almost every LLM has a single model that can do it all. This makes it much, much easier to build applications like this. Today, I could write a CustomGPT in 20 minutes that could do the calorie assessment application, compared to the days when it took in 2023 (which at the time was a miracle).
This point is slightly undercut by OpenAI’s release of Sora this week. Sora is a video generation model that can make some very neat video clips, but it is currently hosted on a separate platform with a separate login. Perhaps by sometime in 2025 it will be part of a multimodal solution too.
2. Thinking models point a path to the future
As I’ll discuss further in #1, this hasn’t been a great year for models getting smarter, but the one exception is o1 and o1-preview from OpenAI. As I wrote about, this “thinking model” has opened up a whole new way for models to improve without adding more parameters. The result is continued progress against some impressive benchmarks such as advanced math and science tests.
I spent a bit of time with o1-preview and o1, trying some coding problems. I found that o1 did much better on hard problems than o1-preview, often solving on the first try where o1-preview sometimes struggled to come up with a working solution even with multiple iterations. Weirdly, though, o1-preview was able to execute the code in the chat while o1 just gave me the code to run in my own environment.2
Now, if we want a model to perform better, we can have it think longer. This creates an opportunity to improve performance across a wide range of tasks, and we’re going to need that because…
1. Hitting the Ceiling with Foundational Models
The top story of the year is not what was achieved, but what wasn’t. Despite enormous anticipation, no foundational model launched in 2024 significantly outperformed GPT-4. This article about how OpenAI’s new model is delayed until 2025 is typical of reporting around this problem. (This is where I have to say that it’s completely possible that a new model will be released at the end of the Shipmas event next week. If so, I’ll discuss it in a post in January, and for now, you can just take this section and imagine it said the opposite.)
There’s an open question of whether the issue is that scaling laws stop working above a certain point, meaning that the 1 trillion parameter model isn’t much better than the 100B parameter model or if the issue is that we’ve run out of data. Or it could be both.
Either way, we may have hit a real wall with the current approach. In that case, the thinking models will be the best way to make progress.
However, I don’t want to end on a down note. Entries 2-10 represent an enormous amount of innovation in just a year. If we never get a better LLM, those innovations working through the economy will still be a really big deal!
I think this is a great strategy. If you tell employees that AI will take out the boring parts of your job, but we won’t fire anyone, that’s going to go far in getting adoption of the tools. In businesses with high employee churn, the savings can still be realized in-year. Trying to target human replacement creates the perception among employees that they are training a robot to take their job, which tends to be unpopular.
GPT 4o has a mode called canvas that lets you edit and debug code while chatting with it. It feels like a real co-pilot. Will be amazing when/if o1 gets that mode.