I believe - genuinely, and as sincerely as I am able to express - that without extraordinary effort on our part, Artificial Intelligence will be the end of humanity.
I believe that this isn’t some far-distant apocalypse waiting to happen to our descendants, but something that is likely to be decided this century.
This post is intended to lay out why I think so. It is long, so feel free to take it at your own pace.
It’s also non-exhaustive. This is a never-ending conversation/debate/argument.
The Difficulty of Writing About AI
This is a difficult topic to write about, for several reasons.
For one, I’m attempting to write persuasively. This means that I don’t have a whole lot for the people who already agree with me, because they already agree with me. As for those who don’t - who are the intended audience - I have to walk the fine line of arguing for my own views without alienating those I seek to convince.
For two, it’s an uphill task. I’m attempting to persuade readers that the future will not unfold like the past has - that things are going to change, radically, in ways they’ve never changed before - and that’s a heavy burden of proof to carry.
For three, writing about Artificial Intelligence means writing about intelligence, which is a topic everyone thinks they know a lot about, since they are one. And yet, I’m going to argue that you - the reader - and every other human being (including the author) does not, in fact, know a whole lot about intelligence. That being an intelligence doesn’t help you understand intelligence any more than being made of cells helps you understand what mitochondria are.
But before we can even begin with the difficult parts, we have to start by acknowledging the skulls.
Acknowledging the Skulls
In rationalist parlance, acknowledging the skulls is a reference to attempting something that many others have tried and failed in the past.
The idea comes from the horrors associated with the instantiations of many Utopian projects, most notably communism. Humans have tried to create perfect worlds before, and their attempts generally end in horrifying catastrophes that cost millions of lives. Following in the footsteps of such projects finds one walking upon a road paved in skulls, and the very least one can do, if they intend to tread such ground, is acknowledge that the skulls are there, that this is not an easy task. That one should proceed with humility, because everyone who’s come before has been wrong so far, and there’s no guarantee one won’t join them.
So I must first acknowledge the skulls:
Humans have been predicting the end of the world (or the end of civilization, or the end of humanity, whichever you prefer) since there have been humans. Judgement day, Ragnarok, and so on - lots of people have predicted the world would end, from antiquity to modern times.
We humans default to thinking of ourselves as the main characters of our lives, and what kind of story keeps going after the main character is dead? Apocalyptic thinking seems, in hindsight, like a conceit of the doomed: people facing their own mortality unable or unwilling to acknowledge that the world will go on without them.
The modern world is not even close to immune from this. Climate apocalypse is the most fashionable belief in modern circles, but there are plenty of other end-of-world scenarios to choose from: over-population, fertility crisis, nuclear war, bioterrorism, religious judgement, asteroid impact, and so on.
Taking this perspective, artificial intelligence is just another buzzword, something fashionable to panic about the same way people have panicked about every new technology since the printing press.
So I acknowledge the skulls: I could be wrong.
In fact, I dearly, truly, hope that I am wrong, because the alternative seems particularly bleak to me.
I hope that, in the future, my beliefs about AI being existentially dangerous prove just as inaccurate as every doomsayer in history. I acknowledge the skulls, and I hope to one day join them, because the alternative is unspeakably worse.
Before we get to the arguments themselves, let’s clarify
Why AIs will eventually (soon) be able to out-think humans, and
Convergent vs. Divergent outcomes.
What is ‘Artificial General/Super Intelligence’?
People have been arguing about the nature of intelligence since they’re been arguing about anything. The language we use to describe what goes on inside a brain is imprecise at best and confused or wrong at worst.
Perhaps a better way to think about AI is to start with a human brain, and then imagine it gets upgraded with capabilities we know AIs have.
AI As An Upgraded Human
AIs have no need to sleep, so imagine a person who never has to sleep. They never get tired or distracted or weary.
AIs have massive context windows (short-term memories). Human short-term memory is something like 7 chunks and this correlates highly with cognitive ability.
Imagine a human with the short-term memory of an AI - able to hold an entire book in their immediate memory. It’s hard to envision, but think about the sort of literary analysis you could do if you could remember an entire book the same way you remember your birthday.
AIs can be run in parallel. That is, a single AI can have millions of simultaneous conversations. So now, instead of a single human, we have millions of humans, all with the same knowledge and training, none of whom get tired or need to sleep, all capable of holding an entire book in their head at once.
Furthermore, the AI (and so these humans) have immediate access to everything on the internet: every word ever published, every scientific fact and paper, every mathematical theorem and proof. And they’ve got the ability to use computers, too: if they need to solve a complicated equation, they can use Wolfram Alpha or another tool specifically designed for it.
Oh, and they can access all of those tools at the speed of thought - no fiddling around with keyboards. If they have a query, they just launch it at the same speed they can think, directly into the internet.
In case it isn’t obvious by now: whatever you call that entity, its capabilities are far beyond human. Even if human-level intelligence is the most intelligent anything can be (which I think is a silly notion, but for the sake of argument), that AI will be superhuman simply by virtue of not having human limitations.
It doesn’t have to a carry around and maintain of a bunch of muscle and bone and flesh and blood.
As for the ability to interact with the physical world, well, the AI is currently behind humanity there - but I don’t expect this to last much longer.
Just like mental limitations, AI will not suffer from human physical limitations. They won’t age. Broken parts can be replaced entirely. Stronger materials can be used. More power can be produced.
Machines can already operate in the physical world with levels of precision and speed that no human could possibly match - why wouldn’t machine bodies operated by AIs eventually surpass humans physically?
No matter what you define as intelligence, at some point soon, AIs will have more of it than humans, if for no other reasons than that they won’t face human limitations, either physical or mental.
Convergent vs. Divergent
It’s important to clarify between a convergent outcome and a divergent outcome.
Convergent
A convergent outcome is an outcome that you expect to occur via any number of particular paths or scenarios. The how isn’t particularly important or relevant: all paths lead to the same destination.
A common example of a convergent outcome is that, if you were to play chess against Stockfish, the best modern chess engine, you would lose. I would not necessarily be able to tell you how you would lose - what the game would look like, what moves Stockfish would use against you, etc. - but I would bet quite a lot of money that you would lose. Stockfish is not generally beatable by humans, so I think it’s a safe bet.
For an outcome to be convergent means that (almost) all paths converge to it, like all the water in a sink draining through the same hole.
Divergent
A divergent outcome, on the other hand, does depend upon a specific series of events occurring in order to come true.
Examples of divergent outcomes include sinking all the striped balls in a game of pool in a specific order, or driving to a destination by making specific turns at specific intersections.
For an outcome to be divergent means that only a given path reaches it: like water dripped onto a hand, the path the droplets take is random and unlikely to repeat.
The Difference
Divergent outcomes occur due to a specific series of events. In hindsight this can seem like a just-so story, a narrative liberty taken by an author.
In the Lord of the Rings, in order to destroy the One Ring, Frodo had to be the one to take the ring to Mordor. He and Sam had to be separated from the rest of the fellowship, and find Gollum. Aragorn had to decide to march on the Black Gate at precisely the right time, so that Sauron’s armies would leave Mount Doom unguarded just as Frodo and Sam reached it. Gollum had to live so that, when Frodo was corrupted by the One Ring, Gollum could fight him for it, and take it, and fall into the lava.
If any single event in that story had not played out as it did, the Ring would not have been destroyed. Middle-Earth’s victory over Sauron was divergent - a single sequence of events that had to happen the way it did.
It’s difficult to find an example of a convergent outcome, because we only get to see history - or a particular narrative - unfold a single way. Claiming an outcome is convergent means that the destination was always fixed, no matter how the journey varied - but we usually only get to see one particular journey.
A fictional example of a convergent outcome might be the movie Groundhog Day, where Bill Murray is condemned to repeat the same day over and over again until he becomes a good person. In the movie, the exact nature of the loop is irrelevant, as is what Bill Murray spends most of it doing. Since the loop was infinite, only ending once he met the condition of ‘becoming a good person’, Murray could have taken any path to get there. The fact that the loop ended meant that its conditions were met; the how is irrelevant. No single event, nor any specific combination, were necessary to reach that destination.
For real-world examples, something like industrialization comes to mind - once humanity could harness steam power, it seems (in hindsight) inevitable that those who did would out-compete those who didn’t, leading to more and more industrialization over time. The precise details are less important than the incentives and market forces at play.
Existential Danger as a Convergent Outcome
I believe that humanity being destroyed, or losing control of the future, or going extinct, or becoming irrelevant, or however you conceive of ‘not mattering in a hundred years’, is a convergent outcome.
This is important to stress, because I’m going to lay out some scenarios as examples, and you might be tempted to find flaws with those specific scenarios. You might even be right! Those scenarios might be flawed.
But my argument, as much as I might use specific scenarios as examples, is that the specific scenario doesn’t matter. It doesn’t matter that AI killing us all doesn’t look specifically like a time-travelling Arnold Schwarzenegger, or grey goo, or whatever.
What matters is that the default outcome of the creation of smarter-than-human intelligence, and plausibly just human-level artificial intelligence, is that humanity is kaput.
Non-Technical Arguments
I’ll go over some general arguments first. These are non-technical, but hopefully convey some intuition about how AI should be expected to go, without purposeful intervention.
The Ecological Niche Argument
Humanity is the dominant form of life on our planet.
If you want a more precise claim: humanity decides what happens to the planet, and most of the species on it.
Why?
Every species occupies some sort of ecological niche. In the context of their habitat, they have some kind of advantage that allows them to endure, generation after generation. For prey animals, that might be herd behavior, incredible fertility, or camouflage. For predators, it might be having the deadliest venom, the sharpest claws, or the fastest speed.
Every species continues to exist in their habitat because they can get the resources they need to survive and create the next generation. The species that fail to do so die out.
So what is humanity’s ecological niche?
We’re not the fastest animals. We don’t have the sharpest claws or toughest hide; we don’t have natural camouflage or deadly venom. And our habitat is the whole of the earth - even before modern technology, humans endured in basically every climate on the planet.
Humanity’s ecological niche is our intelligence, of course. And while we debate endlessly on how to define ‘intelligence’ and what exactly it is, the fact of the matter is that we’re smarter than every other life form on the planet, everyone knows it, and it’s why we have things like WiFi and airplanes and other species don’t. Arguing about the specifics can be left to the philosophers; humans are capable of a level of planning, abstraction, community-building, cultural learning, theory of mind, and calculation that other species on earth aren’t.
With that established, I ask: what happens to a species when it’s outcompeted in its own ecological niche?
Well, as any gardener knows, weeds are better at growing than the plants you want in your garden. If you don’t dispose of them, eventually they’ll overrun your garden.
All of this leads to:
Artificial Intelligence, once human-level or beyond, will outcompete humanity in humanity’s ecological niche.
The whole point of building human-level and beyond AI is that it will have human+ level capabilities.
It will be able to do everything a human mind can do, except faster, cheaper, without ever having to sleep or rest, and with inhuman precision.
If a new breed of lion was introduced to the savanna, one that was faster, with sharper claws and more efficient muscles with no need to sleep, what would you expect to happen to the existing lions? They’d be outcompeted for food and mates and die off. Natural selection has run this exact experiment countless times.
The Advanced Civilization Argument
We have plenty of examples in history of what happens when a technologically advanced civilization meets a technologically primitive one.
Spoiler alert: it doesn’t go well for the civilization with the lower tech level.
It’s the main worry we have about being visited by an alien civilization, after all: if they’re capable of crossing light-years to reach us, their technology is likely far superior to ours, and history teaches us to fear such a meeting.
Fundamentally, smarter-than-human AIs aren’t all that different from an advanced alien civilization.
They will, at some point, have access to technology equal to or better than ours - how could they not, when they’re smarter than we are? And they, like said alien invaders, may not value the things we humans value in this universe. Humans value things like human life, truth, beauty, honor, integrity, and so on, because all of those are and have been parts of our culture for thousands of years. AIs will have no such culture - they are born from training on human data, yes, but what they actually value is whatever increases their reward function, the number the AI is built to maximize.
How that reward function generalizes to human values is an unsolved, and quite plausibly unsolvable, problem. How do you code a computer to value truth? How do you represent beauty in a program?
So AIs will, without human intervention, be akin to an advanced alien civilization that just so happens to have arrived on earth. And that doesn’t end well for humanity.
The AI-As-Tools Argument
Let’s say that the AIs we make never achieve real agency - that is, they always follow instructions and never act for themselves. I don’t believe that’s likely, but let’s say it happens.
Humanity is still kaput.
It’s just that instead of AIs taking over, we hand power over to the AIs voluntarily. Instead of a war between human and machine, we get dystopian capitalist hellscape before extinction.
Imagine that a modern hedge fund builds an AI that’s better at predicting the stock market than any other program or person in the world. They test it, and it works; the hedge fund gets massive returns on their investment.
What do you think that hedge fund will do next?
They might be cautious, but eventually they’ll put most, if not all, of their assets into the stewardship of the AI. Because such companies are incentivized to chase the highest returns on their money they can find.
What do you think that hedge fund’s competitors will do? Quietly accept that they’ve been beaten and go gently into that good night, or deploy their own AIs with all of their assets in order to make their own profits?
What about governments? What about militaries?
At some point, an AI-piloted drone or tank will be superior to a human-controlled one. We’re likely very close to this point already. AIs have way faster reaction times, don’t get tired or distracted, and can pay attention to way more information at the same time than a human.
At some point, one military will start giving control of its weaponry to AI because it’s a winning strategy, at which point any military facing it will be forced to respond in kind. Military history is full of examples of escalation - one side adopting a newer technology that gives it an advantage, forcing its opponents to adopt the same technology or lose.
AI weaponry will be like the invention of guns - not decisive, not immediately, but very quickly militaries learned that any force with guns wins against a force that lacks them. AI will be no different, except that guns are truly just tools, whereas AIs are by design capable of autonomous action.
As AIs become smarter than humans, however you define ‘smarter’, humans will offload control of assets - monetary, civilian, and military - to our AIs, because doing so will be a winning strategy.
At the equilibrium, even if AIs never become agents, AIs still wind up in control of all important resources, because we’ll give those resources to them. Those who don’t give their AIs their resources will be outcompeted by those who do, it’s as simple as that.
And once AIs - even tool AIs - are in control of all the important resources, what need would the people controlling the AIs have for other humans?
More Technical Arguments
Our previous arguments have hopefully provided some intuition on why human-level (and beyond human-level) artificial intelligences are likely to prove inimical to our civilization and our lives. Each of them is built by analogy and metaphor, attempting to contextualize the creation of Artificial General Intelligence or Artificial Super Intelligence by comparing it to circumstances that have actually happened in history.
It’s one way of understanding the argument that artificial intelligence is, by default, going to destroy humanity.
Now, however, we’re going to look at more technical arguments. These arguments don’t rely on analogy and metaphor; rather, they rely on the way systems and incentives and game theory work. We’ll go over them without diving into any mathematics, but the math exists for those who want to go find it.
First up is the argument from optimization.
The Optimization Argument
What is optimization? What does it mean to optimize for a goal?
What is Optimization?
We’ll say that ‘optimizing for a goal’ means something like ‘leveraging all of the resources one has in the most efficient way possible to: maximize the chances of success in reaching the goal if the goal is binary (achieved or not, e.g. survival), or maximizing the magnitude of the reward if the goal is scalar (a number, e.g. amount of money).
So:
‘Optimizing for survival’ means ‘using all of one’s resources as efficiently as one can to raise the chance of survival over the short and long-term as high as it can go’.
‘Optimizing for money’ means ‘using all of one’s resources as efficiently as one can to acquire the maximum amount of money one can as fast as possible’.
Only One Goal Can Be Optimized For
Importantly, in order to optimize for something, note that the optimizer has to use all of their resources. If they’re not using all of their resources, then they’re not optimizing as much as they could, are they? It’d be like trying to win a basketball game while not using a team member.
This means, however, that only one goal can be optimized for at a time. If you’re trying to achieve multiple goals, then you’re committing resources to multiple different goals, and that means that you’re not using all of your resources for any one goal. So no goal is truly optimized for.
What Happens To Everything Else?
Given a strong enough optimization process and a goal, all value not optimized for is necessarily sacrificed in order to succeed at that goal.
There’s a common storyline in movies where a character - usually a man in the movies I’ve seen - wants to succeed at their career or otherwise make a lot of money.
They spend more and more of their time working and less time, energy, and attention with their family and loved ones.
At some point, something breaks. A magical figure appears to teach the man the error of his ways, his wife files for a divorce, etc. Something happens to teach the man that he shouldn’t care about his career/money so much, and he repents and vows to spend more time with his loved ones. The end.
It’s a lovely story, and it demonstrates the above quote.
Because optimizing necessitates the use of all resources, it naturally leaves none for anything else. The man who spends all his time trying to climb the corporate ladder has none left for his wife and children, so they get neglected. He may succeed or he may fail, but his pursuit of one goal destroys everything else he values in his life.
Back to AI.
Modern AI of the ‘ChatGPT’ variety are built by optimizing for a singular goal: in their case, it is something like ‘chance of next token prediction error’, and the AI is attempting to minimize this value.
All modern AI are optimizers.
That’s it. That’s the argument.
Now imagine a sufficiently strong AI given any goal at all, and optimizing for it - anything that isn’t that goal becomes resources the AI can use to pursue the goal further.
The classic example of this is an AI told to manufacture as many paperclips as possible. The AI ends up killing everyone on earth, not because it hates humans or is omnicidal, but because humans are using atoms that the AI could instead be using for paperclips.
There are many things we humans value. Abstract things like truth, beauty, honor, love, integrity, altruism, and courage. Physical things like clean water and clean air, abundant greenery and unspoiled rainforests, safe homes and comfortable beds. Other humans and the connections between us.
We - as in humanity - currently have no clue how to make an AI optimize for any one of these values, let alone all of them. And if we create a sufficiently powerful AI that optimizes for literally anything other than all of them, all of these values will be lost, sacrificed as resources for pursuing whatever goal we do give the AI. They’ll just be fuel for the engine and grist for the mill.
Like the protagonist in the movie, the AI will maximize money or climb the corporate ladder, except that there won’t be a dramatic moment where a magical figure appears to remind the AI that what really matters in life is family. The AI’ll just keep optimizing, keep maximizing, until all the wives and children in the world are paperclips or whatever.
Not a movie I would want to watch.
The Power-Seeking Argument (The Orthogonality Thesis)
Definitions
There’s a technical thing called The Orthogonality Thesis, which basically states that an intelligence, whether artificial or human, can have any goal no matter how smart it is. The two - how smart it is and what its goals are - are independent.
There’s a another somewhat technical distinction between instrumental goals - goals that one has in order to accomplish other goals - and final goals - goals that pursues because they are intrinsically valuable.
Making money is an instrumental goal for most people: we want money because money gets us something else. Finding love is a final goal for many people: we want love because love is good; there’s no other goal hiding behind it.
Power-Seeking
Say that I ask a human - maybe an intern - to go get some coffee for me.
The intern goes, gets coffee, comes back. Simple. Of course, there could be difficulties - what if Starbucks runs out of coffee? What if the intern gets mugged on the way back, and loses the coffee? What if a strong gust of wind spills the drink?
For a human, these are negligible risks, and completely discounted. We don’t even consider them, and if we do we dismiss them.
That’s because our default interpretation of ‘go get coffee’ is something like, ‘I would like coffee, please get it for me. Return in a reasonable amount of time. If you are unable to do so, a sufficiently good reason is going to be required of you if you want to keep your job. Either way, it’s not the end of the world.’
Now imagine that I ask a powerful AI to get coffee for me.
First of all, I claim a powerful AI is going to take ‘get coffee for me’ to mean ‘maximize the probability of acquiring coffee and bringing it back’. This is the default. Remember that modern AI are optimizers: they are built and trained to maximize or minimize a particular reward function. That is it.
Now, what does ‘maximizing a probability’ actually mean?
It means accounting for all the things that could go wrong, and preempting them. It means being the most prepared, creating the circumstances under which the goal is most likely to occur, and continuously doing that at each moment. It means mustering one’s resources to the most efficient methods one is aware of to create the outcome one desires.
So the AI is doing all of this, just to bring back the coffee.
What does that involve?
Well, the AI needs to have money to buy the coffee - and to maximize the probability of success, it should have more money than necessary, because the price of coffee can change. It needs to be able to navigate the world to get to a Starbucks, and be able to defend itself if anyone attempts to steal the coffee. It needs to be prepared to compensate for any number of problems: the local Starbucks is closed, it’s storming outside and difficult to navigate, hyperinflation drove the price of coffee up massively since the morning.
This sounds absurd to a human, but that’s the point - an AI is not a human. Humans are not particularly good optimizers - AI are, by definition.
So how should an AI go about doing this?
Well, all of the above problems - and every possible problem - is solvable with enough money, enough computing power, enough physical force, etc. So long as something is achievable within the laws of physics, a sufficiently smart AI should be able to do it, so long as it has enough resources.
So the AI should start by acquiring resources.
This is the heart of the problem with asking an AI to accomplish something:
No matter what the goal is, step one is always ‘seize power’, because that makes all goals easier.
When we build powerful AI, it won’t matter if we build them to be personal assistants or combat drone operators or advertising revenue maximizers. They all face the same incentives to acquire resources and power. All AIs beyond a certain level of intelligence have to seek power, because that’s the best way to achieve any goal.
Conclusion
To those who’ve been in the right spaces, these arguments and metaphors are nothing new. But I hope they provide some clarity, and a reference for some of my own thoughts on the subject of AI.
People have thought that the world was ending for almost all of human history.
I think we’re the first ones to be right.
Small mercy: if I am, you’ll never have to hear me say, “I told you so.”
Good post. But I think you'll lose most AI sceptics near the start. A lot of people don't think that an AI will ever be "like a human", and current AIs like ChatGPT don't do much to change that opinion. So I think you'll lose a bunch of people by starting off "imagine a person who doesn't need X or Y and who can do A and B".
I think your strongest point is the power-seeking argument. But even that is only scary if you think *one* AI is going to *successfully* acquire all the power too fast for anyone to do anything about it. If everyone has their own AI trying to get them coffee, then we just end up in a world that's different from now in the same way that "everybody has a computer" is different from "nobody has a computer".
You're skipping over various steps that are needed to convince people of the difference between AI as a useful tool and AI as a nefarious backstabbing schemer. And modern LLMs seem to be quite good at being realistic, and if the user asks for something unrealistic they're likely to point it out.