Lukas Püttmann    About    Blog

A machine does your thinking: "Superintelligence", by Nick Bostrom

In “Superintelligence: Paths, Dangers, Strategies” Oxford philosopher Nick Bostrom writes about scenarios of advanced general intelligence and the threats it could pose to humanity in the future.

What became more and more clear to me while reading this book is just how undesirable the existence of a superintelligence would be. It would be risky, it’s not clear we could get it to do what we want, we don’t know how to specify what we want and even if all these things would be fulfilled: Why should we ever want to lose our agency?

What if

Bostrom’s definition of “superintelligence” (of which he considers artificial intelligence a special case) is silent on if there’s some new mind, but it asks about the capacities of such a thing (his emphasis):

We can tentatively define a superintelligence as any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest. (Chapter 2: Paths to superintelligence)

He argues that if ever we could build it, we would:

Some little idiot is bound to press the ignite button just to see what happens. (Chapter 15: Crunch time)

Even after reading this book, I doubt there’ll ever be some other entity that has agency. The internet or Google or the Go bot won’t “wake up”. Bostrom discusses many more sophisticated ways to get to superintelligence, but those too seem speculative to me.

But Bostrom asks: “What if?” What if there existed such a superintelligence? And it’s worth pondering “what if”, for two reasons:

  • If a superintelligence could be created, then we should plan ahead.
  • Even if we are never going to create a superintelligence, it’s an interesting thought experiment and we might learn something about our values.

Could we control it?

A superintelligence couldn’t easily be contained (“sandboxed” he calls it) and might become a singleton, a centralized decision maker. Whether the superintelligence is “friendly” would be hard to test, as it might well behave so at first to be let out of the box. And we might not get a warning, as its appearance could be a sudden event. The early superintelligence modifies and improves itself which leads to an intelligence explosion.

Bostrom also describes the following cunning plan of perfect enslavement: Feed the AI cryptographic tokens that are highly rewarding to it. And discount strongly, so that the first token gives 99% of remaining utility and the next token gives 99% of remaining utility again and so on. This makes it less likely that the AI comes up with some weird long-run plan of getting more tokens.

But that, too, is far from safe. It turns out that what Bostrom calls the “control problem”, the issue of how to restrain a superior intelligence, is unsolved and hard.

Bostrom discusses different kinds of superintelligences: oracles, genies, sovereigns and tools.

An oracle answers simple questions that we ask it. We might even reduce the possible interactions we could have with it to a chat or to only a yes or no statement, to avoid being charmed by it. I liked the idea of testing the predictions of an oracle by asking the same question to different versions of it that have different goals and varying available information. The distribution of answers (similarly as in bootstrapping in econometrics) then shows us how robust the oracle’s recommendations are.

Genies perform one task, then stop and wait for the next job. A sovereign gets an abstract goal and is relatively unconstrained in how to achieve it. Bostrom thinks these two forms of superintelligence aren’t fundamentally all that different and would both be difficult to control.

A tool is closer to software as we’re used to it. But Bostrom argues that in the future the way we think about software might change and the programmer’s job might become a more abstract activity. So these tools might then develop into general intelligence:

With advances in artificial intelligence, it would become possible for the programmer to offload more of the cognitive labor required to figure out how to accomplish a given task. In an extreme case, the programmer would simply specify a formal criterion of what counts as success and leave it to the AI to find a solution. (Chapter 10: Oracles, genies, sovereigns, tools)

A superintelligence would start reasoning about the world and might even come to the conclusion to think it’s in a simulation (a similar thought concerning humanity’s chance of being in a simulation was recently made famous by Elon Musk):

This predicament [of not being sure whether it is in a simulation] especially afflicts relatively early-stage superintelligences, ones that have not yet expanded to take advantage of the cosmic endowment. […] Potential simulators—that is, other more mature civilizations—would be able to run great numbers of simulations of such early-stage AIs even by dedicating a minute fraction of their computational resources to that purpose. If at least some (non-trivial fraction) of these mature superintelligent civilizations choose to use this ability, early-stage AIs should assign a substantial probability to being in a simulation. (Chapter 9: The control problem)

A superintelligence could react in different ways to such a conclusion. It might not alter its behavior, it might try to escape the perceived or real simulation or this risk of being in a simulation might even make it docile:

In particular, if an AI with resource-satiable final goals believes that in most simulated worlds that match its observations it will be rewarded if it cooperates (but not if it attempts to escape its box or contravene the interests of its creator) then it may choose to cooperate. […] A mere line in the sand, backed by the clout of a nonexistent simulator, could prove a stronger restraint than a two-foot-thick solid steel door. (Chapter 9: The control problem)

What do we want?

So it’s not clear that we could get a superintelligence to do what we want. But say we did, it then remains to specify what we want.

Giving the superintelligence too simple goals wouldn’t be a good idea:

An AI, by contrast, need not care intrinsically about any of those things [that humans care about]. There is nothing paradoxical about an AI whose sole final goal is to count the grains of sand on Boracay, or to calculate the decimal expansion of pi, or to maximize the total number of paperclips that will exist in its future light cone. In fact, it would be easier to create an AI with simple goals like these than to build one that had a human-like set of values and dispositions. (Chapter 7: The superintelligent will)

It’s hard to come up with goals that would be both good for humanity in general and that don’t leave the door open to unintended consequences. If we told it to “make us smile”, well then it might just paralyze all our faces with the corners of our mouths drawn back.

It’s important to get it right because the goals might be hard to change once the superintelligence already exists. But are we sure that our moral judgments right now are exactly right? People in the past probably also thought they had figured things out, but in hindsight we know many of the things they thought were wrong (“the world is flat”) and we object to many of their views (“it’s ok to have slaves”). So our values change:

We humans often seem happy to let our final values drift. This might often be because we do not know precisely what they are. It is not surprising that we want our beliefs about our final values to be able to change in light of continuing self-discovery or changing self-presentation needs. However, there are cases in which we willingly change the values themselves, not just our beliefs or interpretations of them. (Chapter 7: The superintelligent will)

Bostrom proposes a concept called indirect normativity to deal with this issue, in which we let the superintelligence figure out what are better moral standards and it would help us live by them starting now:

Indirect normativity is a way to answer the challenge presented by the fact that we may not know what we truly want, what is in our interest, or what is morally right or ideal. Instead of making a guess based on our own current understanding (which is probably deeply flawed), we would delegate some of the cognitive work required for value selection to the superintelligence. (Chapter 13: Choosing the criteria for choosing)

The superintelligence should also not only act on our short-run urges and passions, but on a more rational and reflective set of preferences. In particular, what Bostrom calls “second-order desires”:

An individual might have a second-order desire (a desire concerning what to desire) that some of her first-order desires not be given weight when her volition is extrapolated. For example, an alcoholic who has a first-order desire for booze might also have a second-order desire not to have that first-order desire. (Chapter 13: Choosing the criteria for choosing)

So people can have preference over preferences. I don’t enjoy reading 19th century classical novels from France, but I have a preference for wanting to enjoy those.

So would the superintelligence slap my shallow Third World War blockbuster novel out of my hands and put Victor Hugo there? It suppose it would a have more subtle way.

We would therefore have a way to modify our tastes and to choose what to like.

So do it?

Say we had solved these problems, so we (i) could actually get the superintelligence to do what we want and (ii) had figured out exactly what we want. Should we press the ignite button, start up the superintelligence and let it do its work?

I don’t think so. I think we still want clarity and truth and to not to be fooled. Simon Blackburn writes:

We might say: one of our concerns is not to be deceived about whether our concerns are met. (Chapter 8: What to Do)

Admittedly, an argument can be made for the opposite. Someone in pain might wish to have his senses clouded with medicine. And not all information is always desirable. I’ll gladly not find out how mediocre the pictures are that I took on my last holiday.

So already now our minds implement what Roland Bénabou and Jean Tirole (pdf) describe as a

[…] tradeoff between accuracy and desirability [in how we form our beliefs]. (p142)

But that’s the thing: our mind actively implements it. We want to build our own model of the world, even if some of our beliefs about it are distorted, not live in the perfect bubble. There’s a “premium” (pdf) that we’re willing to pay, simply to stay in control.

I choose the red pill.


Further reading: Related posts: