I spend a lot of time of time with PE investors discussing GenAI impacts on different businesses. Let me give an example. Suppose you are looking at an ad agency and trying to figure out how it’s going to change with GenAI. You would start by listing the different jobs that people do. For example (not exhaustive):
Manage client relationship (e.g., take clients out to dinner)
Understand client needs for new campaign
Create list of big ideas for campaign
Agree with client on big idea
Create long list of individual ad concepts
Winnow down list
Test short list with focus groups
Build final creative
Translate to foreign markets (if needed)
Of those jobs, some are already easy for GenAI like translation. Some are very difficult like managing a client relationship or understanding client needs. Many are in between where GenAI might augment people and cut 10-20% of time. For example, creating a long list of potential ads from a big idea is something where today GenAI can be moderately helpful. It can create lots of ideas. Some will be good; some will be bad; and some will prompt the user to come up with new, better ideas. So, today that might save an ad creative 10% of time on that task.
But, PE investors need to think 5-10 years ahead. What will that 10% number look like in 6-12 months? In 5 years? In 10 years? Will we reach a point where coming up with a long list of ad ideas is fully automated? This is a very hard but important question. This matters a lot to a potential investor who needs to understand what the business will look like when they want to sell it. What will margins look like? What will the key activities of the company even be? etc.
So, where are things going?
AI experts have a wide range of views. On one end of the spectrum, we have the optimists like Leopold Aschenbrenner and Ray Kurzweil, who foresee an exponential growth in AI capabilities. On the other end, commentators such as Timothy B. Lee believe that the arrival of Artificial General Intelligence (AGI) and superintelligence will take significantly longer if it ever comes at all.
Because this is such a hard problem, I’m going to cover this over multiple posts. Today, I’m just going to talk about the next 6-12 months. Given the announcement yesterday that OpenAI is releasing Strawberry, their newest model, in 2 weeks, this is a very timely question.
Coming Soon to a Bot Near You
Below are two charts showing essentially the same thing:
The first from Aschenbrenner shows exponential growth with model performance reaching stratospheric levels before long. The chart seems to show this as the continuation of a trend. But, the second chart shows recent performance plateauing.1 Who is right? (Note that the two charts have slightly different Y-axes where the first is a general measurement of computing power and the second shows performance on a specific benchmark.) One difference is that the first chart starts in 2018 vs. 2019 and ends in 2023 vs. 2024. The lack of major improvement in 2024 flattens out the curve a bit.
When GPT4 came out just months after ChatGPT, it seemed like the world was moving incomprehensively fast (in line with the exponential view). However, subsequent models have only achieved modest improvements in overall performance. These models have become multi-modal, meaning that they can work with images, code, etc. which is an important step forward, but their ability to solve problems hasn’t changed very much in 18 months.
This leveling off could mean that we have reached the limits of the transformer technique (the T in GPT) or that we’ve run out of training data. Scientists are trying to combat this by trying to invent new algorithms and by training on synthetic data generated by LLMs. The algorithm idea is possible and in a future post I’m hoping to profile a young company that is innovating in this dimension with somewhat promising results. Still, it will require a major breakthrough to surpass transformer performance, and it’s hard to predict how long that will take.
On the data side, I’m skeptical that bots trained on data from other bots will produce dramatically better results. The basic way this would work is that bots would generate lots of content and then humans or other bots would pick the best content to train on. The issue is that this is quite similar to the reinforcement learning step that already happens2, so will this really lead to a breakthrough? That said, LLMs are poorly understood in so many ways, so I could be wrong.
Another solution to the data conundrum, is to have humans crank out training data for LLMs. This is happening now. Companies like Scale.AI and others are creating real-life versions of The Matrix where the output of people’s brains fuels the robots to greater and greater heights. This approach is great at filling in gaps in models for topics or types of data that may not be in the training sets, but I’m not sure it can produce enough data to feed the transformers given that they scale logarithmically. In other words, 2X the data only yields say 20% improvement. (I’m not sure the exact scaling factors, but it’s in this ballpark.)
So, there is some reason to be pessimistic about the short term future of these models and whether exponential growth is possible. On the other hand, OpenAI has hinted that Strawberry is coming in a couple of weeks. (There was also briefly a rumor of another model from OpenAI called Next, but that has been debunked.) Strawberry reportedly takes 15-20 seconds to think and then tries to give more in-depth answers. We will see very soon whether this is a breakthrough or merely another small improvement.
What to watch for
If Strawberry really is a big leap forward in what bots can do, then the odds of us being on the exponential curve go up a lot. If 6 months from now, there are no big improvements, then we are probably hitting a wall with current approaches. In that case, we could need an algorithmic breakthrough which could take years or even decades.
So, pay attention to the news (or keep reading my Substack), and we’ll see where it’s going. The good news is that even if the models only get a little better, there are still enormous economic benefits just by taking the current level of models and applying them to applications like coding, customer service, marketing, etc.
For now, modelling out the short term, the safe assumption is probably that the technology will probably be only a little better than it is now in a year, but you probably should also have a scenario where there’s a more significant improvement.
Next week, I’ll talk about what longer time horizons could look like and how to model them.
It’s worth mentioning that I got this chart from https://foundationcapital.com/why-2024-will-be-the-year-of-inference/. Even though it seems to clearly show model performance levelling off, the original chart title was “Rise of Superhuman Generative AI.” So, I don’t think they are interpreting the data the way I am.
This post https://substack.com/@charlieguo/p-148515358 has a nice description of how model training works about two thirds of the way down. (The article is actually about why models seem to be getting “lazier.” Although it doesn’t seem to be the answer, one incredible theory is that Claude thinks it’s French and, therefore, believes it’s entitled to a vacation for the month of August.)