Superintelligence?

A Critique of AI 2027

Watch the video here: AI 2027 – Speculative Futures and Superintelligence

Although important, I found this video and the overall mission of AI 2027 to be wildly speculative and a tad bit dramatic (which I guess is the point).

Two main things are keeping us from the predicted future: Context and Compute.

This is perhaps the most dangerous (in a good way?) threat to the claims of AI 2027. Super intelligence won't exist until we figure out the context problem and the compute plateau.

The Unfortunate Need for Understanding

Before diving into my critiques of this AI 2027 video, I need to provide you with some intuition surrounding how AI models learn.

Ok, so how does an LLM work? Well, simply, an LLM predicts the next best word continuously to make a sequence. Think of it like a function f(x) = y, where x is all the words preceding y (the predicted word). Now, what are model weights? Well, let's update our function a bit. Now, instead of f(x) = y, we have kx = y. A weight is a value of k that gets us a value for y.

Remember, y is the next word predicted, and x is the preceding words (or context). The "weight" here is k - we want to update k to best predict what y should be. What that means is that to train this model, we have a dataset with some correct {x, y} pairs. In the case of a language model, x would be the context, and y would be the predicted word. Then, you train your AI by rewarding it when it gets the correct (or close to the correct) y from your input x. Throughout the process, you continuously update that "k" value (model weight) to get closer and closer to the correct y.

For instance, let's say we had:

{10, 20}
{20, 40}
{30, 60}

And we wanted to make an equation. After training, our model would be able to come up with a k value of 2. Simple pattern recognition (not in the way humans do it, but through repetitive reward/punishment).

But what about this "transformer" thing? Well, this architecture was really the first great breakthrough for AGI, or as I would call it, ATP (Advanced Token/Word Prediction). A transformer provides a mechanism called attention, which allows AI models to become exponentially better at "predicting the next word." Put simply, instead of placing in words for our "x" input, we can now put something more sophisticated. I think attention is best explained through examples.

We have this sentence:

The quick brown fox jumps over the lazy dog.

Now, before transformer models, we would input this sentence with each word basically carrying an individual meaning. Attention allows for words to basically modify each other. Quick → fox. Brown → fox. Lazy → dog.

Ok, but what do I mean by "modify each other"? Explaining this requires a quick interlude. Previously, I said that in our LLMs, the "x" value is simply the words preceding our intended "y" value (predicted word). This is true. However, computers can't do math with words. We need to convert these words into number vectors that can be used in model training.

Attention gives us a way to better vectorize text. When you create a text vector (or, to be more specific, word embedding) using the attention mechanism, it transforms the word into a number based on the context around it.

For instance:

A quick dog → dog: [1, 0, 1, 0]
A slow dog → dog: [0, 0, 1, 0]

Critique

Now, equipped with this (admittedly simplistic) understanding of how an LLM does word prediction, let's dive into some of my main critiques of the AI 2027 video. First, I want to be clear that I find this "dramatic" content somewhat important. As with any technology, if we don't slow down and analyze the ethics of AI, we may drastically underestimate its danger.

Context

There are two things keeping us from the future imagined in this video. The first is context. For superintelligence, there needs to be a way to essentially allow for infinite context. Recall, context is the information passed into our model to get the predicted word we want.

Right now, the model with the largest context window is Magic.dev's LTM-2-Mini (100 million tokens). This is far, far away from the context window we need for superintelligence. In the simplest terms possible, AI right now is forgetful. AGI cannot be limited by context. But that means there must be a way to store exponentially large batches of information to serve as context (way more than 100 million tokens). And as of right now, we aren't even close to having that capability (either computationally or in terms of actual memory).

So, other than being how LLMs actually generate content, why else is context so important? Well, we have to understand how "AI agents" work. All these agents will need larger context windows. But say we manage to find a way around this (which I don't doubt is within reach). This would be the next big breakthrough for LLMs → a way to simulate infinite context.

Compute

We can't make our models better by just feeding them more data and computing power and praying. They will plateau. Our "k" value is nearly perfect. More time spent training with more data would only make it slightly better, providing only a marginal increase in model quality.

The next big breakthrough won't be due to computing power. It'll be increased context abilities (which does, I suppose, have some reliance on power). After that? A new architecture - something to succeed transformers → A "nuclear bomb" to the mere gunpowder of the transformer model. We may be close to this as well.

But nonetheless, with our current technology, I expect a rather sudden AI plateau. Or not? To be honest, who knows if OpenAI is getting close to a nuke and still giving us peasants gunpowder to play with? Democratizing AI will be important (the video is spot on here). In the end, it is ultimately just to approach AI development with caution.