WriterOfMinds: Philosophical Blather

Showing posts with label Philosophical Blather. Show all posts

Thursday, November 13, 2025

Book Review: "Creation: Life and How to Make It"

Creation is another book by Steve Grand, the mastermind behind the popular Creatures series of artificial life simulation video games. Once again, hat tip to Twitter friend Artem (@artydea) for recommending these. This book happens to be about how Grand made Creatures, so unlike Growing up with Lucy, it describes a finished project. I was still surprised by how much more it focuses on philosophy and theory than on practice. Grand seems less interested in sharing technical details of how his Creatures work, and more interested in convincing the reader of two theses: that "artificial life" can really qualify as life, and that even if we (and our AL creations) are purely mechanistic, we can still be special - there is no need to grieve the absence of a mysterious or supernatural element in life. This is heavy material to deal with, so although I'll try to keep this review concise, there may be a lot to unpack.

Part of the cover art for "Creation." It shows a human head, semi-transparent, with a large gear inside and the author's name in the center.

What does Grand mean by "life"? The book considers everything from self-sustaining biochemistry, to intelligence, to sentience, to personhood, and Grand doesn't always explicitly distinguish between them. That makes sense for him: he views them all as outgrowths of the same fundamental principles, different levels in a "hierarchy of persistent phenomena." But I think his arguments work well on some of these subjects, and not as well on others.

The first thing Grand wants to emphasize is that your material body, and indeed every physical object you can see, is more of a system or a process than a static "thing." Your body has fuzzy boundaries and is constantly swapping atoms with the rest of the environment; in fact, much of it will be recycled over the course of your life. If the matter and energy that make up your body are all that constitutes "you," then you can't legitimately claim to be the same entity you were ten or twenty years ago! Therefore human identity must arise from something else - from form, the arrangement of matter and energy, rather than matter and energy alone. You're not so much a distinct clump of molecules as you are an intangible pattern that moves through space and persists in time, imposing itself on matter as it goes. Furthermore, he invokes wave-particle duality to argue that even matter is a process: protons and electrons are stable, persistent disturbances of space, rather than distinct things in themselves (much as a ripple is a disturbance in a liquid, rather than a distinct thing in itself). Grand's ultimate point here is that abstract concepts are every bit as real as objects. You may not be able to see and touch items like "society" or "poverty" or (crucially) "mind," but that doesn't mean they're imaginary; they are simply "higher-order phenomena." They are things that happen to matter, even as matter itself is a thing that happens to spacetime.

I agree with this perspective pretty well, as I expect you'll see if you read my thoughts on the soul in the context of another book review from years ago. But Grand and I are coming to it from opposite directions: I'm trying to explain the spiritual in familiar terms, whereas he's trying to explain the familiar in spiritual terms. Grand takes it for granted that the physical universe we currently occupy is all that's out there; he's merely trying to re-enchant it, to recover some of the benefits that spirituality provided while remaining (in essence) a materialist. I wonder why, having admitted that things like information and form are in some sense immaterial but still truly extant, he does not at least open himself to the possibility of yet other things that cannot be seen and touched.

This sets the stage for Grand's personal definition of "life," which is "patterns that persist by metabolizing and reproducing." This aligns with the "descriptive" scientific definition of life that I'm familiar with, though the latter often adds other factors like growth, and movement or reactivity. [1] The next step in the argument should be fairly obvious: if life is fundamentally about self-maintaining patterns, rather than their substrate of molecules and electrochemical reactions, then it's absolutely possible for life to exist in a virtual world. A program is just another kind of pattern; if it consumes computational resources and copies itself into new regions of memory in order to persist, those are just alternate forms of metabolism and reproduction. If the essence of life is form rather than substance, the fact that computers are made of different substances than organic beings doesn't matter at all.

I think this argument works well if the aspect of "life" we are considering is biochemical self-maintenance, or even intelligence. I don't think it works for phenomenal consciousness. The problem there is that PC, in its fundamental definition, is not a pattern - it's an experience. And we must ask ourselves whether this experience can arise from patterns alone, or whether it requires some specific physical phenomena present in brains (such as electric fields). If the latter is true, a traditional computer can never replicate consciousness, only simulate it. For an expansion of this discussion, see Part 3 of my Symbol Grounding Problem series. Grand sidesteps this question, seemingly assuming that consciousness is another higher-order phenomenon, and therefore must be a pattern, or emergent from one. It's an unfortunately common flaw among AI researchers ... I think because they so badly want the mechanisms of consciousness to be knowable. They'd rather not admit that there's an aspect of human existence they aren't sure how to replicate. To be fair to Grand, he does seem to admit more ignorance about consciousness in Growing up with Lucy (which was written later), so maybe the full nuance of his opinions just doesn't come across in this book.

A screenshot from the original Creatures game, obtained from old-games.com. It shows a cross-section of three floors of a large communal house, including a kitchen, some kind of mechanical room, and a computer room. Part of an outdoor environment with trees and sunflowers is also visible. There are three norns in view; two appear to be interacting, and one of these has a speech bubble up and is talking gibberish.

"Creatures" screenshot obtained from old-games.com

After tackling how life might exist in a computer, Grand goes on to discuss some nuts and bolts of intelligence. As in Growing up with Lucy, he conceptualizes it as a multitude of interconnected feedback control loops, each tasked with maintaining some aspect of the creature within desired parameters, or changing it in response to stimuli. He calls out the old discipline of cybernetics (which is all about such control loops) as a better basis for intelligence than some of the more modern techniques. A very basic control loop might react to immediate sensory input; more advanced loops might be concerned with learning from experience or planning for the future, and in their operations would modify the parameters of the basic loops.

I like this general framing, but I do have a couple of issues with Grand's presentation of it. First, he myopically treats self-persistence as the sole motivation of intelligence ... whereas I see intelligence as driven by goals, of which persistence is but one. Make whatever arguments you like about natural selection; it remains evident that survival and reproduction are not the only human goals. If they were, there would be no suicides and no voluntarily childless adults. Some people apply substantial ingenuity to not persisting. You don't have to like that, but I don't think you can call those people unintelligent. And it's no great stretch to imagine artificial minds with even more alien primary goals.

This premise leads Grand into some naive behaviorism, such as (I paraphrase): "If someone gives signs that you've offended them, that only bothers you because you associate their facial expressions with childhood memories of being hit."[2] Give me a break. I suppose one could argue that being punished for doing obnoxious things as a child helps the emotions of guilt and embarrassment develop; but even if that's true, in a practical sense they eventually become a drive all their own, rooted in empathy and disconnected from memories of being spanked, grounded, or given extra chores. Making other people feel bad makes us feel bad, whether we suffer reprisals or not - and some of us would happily accept physical pain or deprivation to avoid embarrassment.

The other thing that troubles me is Grand's insistence that the control loop hierarchy has to be "bottom-up." This is a way of saying it has to be decentralized and swarm-like; each little loop should concern itself only with its specialized activities, without knowledge of other loops beyond any it directly interacts with. There can be no central coordinator that is in some sense aware of what the entire organism is doing. Grand argues that "Top-down control leads to complexity explosions, because something somewhere has to be in charge of the whole system, and how much this master controller needs to know increases exponentially with the number of components in the system." [3]

Swarm intelligence is certainly a thing, and is effective for solving some problems. But I don't care for Grand's insistence that it's the only feasible option. Although the brain may not have an obvious "master controller," our abstract model of the mind does: we call it "executive function." Our thoughts seem to have a kind of planning center that is explicitly aware of our goals and directs our activities accordingly. (How well the executive function works, and how much influence it exercises, varies from person to person. I don't think this weakens my argument, which is simply that such a thing can work.) And I'm unconvinced by Grand's claim that top-down control always leads to a complexity explosion. The planning center can have a perfectly adequate grasp of what is happening in the whole system without knowing all the details - those can be delegated to controllers lower in the hierarchy, which then pass summarized information toward the top.

The thing I often dislike about bottom-up approaches is the assumption that if we just get the bottom level working and put enough little pieces together, higher-order behavior will appear. There are tantalizing examples of uncoordinated motes creating such higher-order behavior (i.e. emergent behavior), but I don't consider this a guaranteed outcome ... so if you're building a bottom-up system and you don't tell me how you plan for the interactions among your motes to create something greater, I'm going to be skeptical.

My last quibble with Grand's approach to intelligence is his insistence that artificial life needs virtual embodiment in a virtual world. I agree with him that an intelligent entity must interact with its environment, and the interaction needs to include feedback that is meaningful to the entity's goals. But I don't see why the entity could not be a pure abstract mind (operating on language or some other informational substrate), and its environment could not simply be the computer itself: its file system, its input/output streams, its other programs. Grand considers options like this, and concludes that we wouldn't know enough to provide the right feedback for training such an entity, because we would be unable to draw inspiration from biological life; and without living in a world like ours, the entity would be unable to make sense of our information. I'd call that a skill issue. For further discussion of this topic, see Part 4 of my Symbol Grounding Problem series.

The remaining big idea Grand introduces is holism. Holism is the notion that a system can be qualitatively different from any of its individual pieces. Put the pieces together the right way, and in a material sense you have no more than you started with, yet in another sense a whole new entity has arisen. As Grand says, "there is no such thing as half an organism," and there's no such thing as half a mind either. If you pull parts out of an organism, they stop being alive;[4] if you split subsystems out of a mind, they stop being intelligent.

I think this is a big part of Grand's "people are still special even if we're purely physical machines" argument. If you say that intelligence and consciousness are "just a product of little electric currents and chemical reactions," you could be right in a sense ... but you CANNOT conclude from this that a human has no greater worth than any old piece of wire carrying a current. Because the particular arrangements and combinations of these little events in a brain produce a whole new thing that has meaning beyond the events themselves.

There are technical details about how Grand built his Creatures in this book. He identifies neurons, chemoreceptors/chemoemitters, and genes as the "building blocks" of a biological organism's control network, links them with cybernetic concepts that function similarly, and describes how he assembled them into a prototype "norn." Despite my complaints about some of Grand's philosophy of intelligence, I think his work is closer to "the real deal" than much of what passes for intelligence in the more modern generative AI space. Norn intelligence is grounded and agentic. Their tiny brains have mechanisms for attention, reinforcement learning, generalizing, and forgetting. They have drives and reproductive cycles modeled on mammalian biochemistry, and they can adapt over the course of generations via genetic recombination and selection. The fascinating descriptions of how all this works only span about four chapters of a fifteen-chapter book.

The last two chapters are devoted to AI safety concerns and the "slippery stuff" (consciousness and free will), respectively. They're very much like the comparable chapters in Growing Up with Lucy, so I won't break them down in detail. However, this book's version does have a couple nuances I want to call out.

First, for all his arguments that digital life can be like biological life in every way that matters, Grand concludes that his Creatures probably aren't conscious. (And this is fortunate for my opinion of him, since it throws a milder light on a couple statements I would otherwise find ethically disgusting.) He regards Creatures as non-conscious because they are "locked into a sensory-motor loop" and lack the "capacity to imagine"; in other words, they are pretty reactive. Although they have some self-awareness and can learn new behaviors, they don't make plans or have episodic memory. But this, in my opinion, is a really BAD reason to insist that something is non-conscious. Once again, the essence of consciousness is subjective experience ... and it is entirely unnecessary to reflect on, remember, or imagine experiences in order to simply have them. When you are submerged in a fever dream and most of your "higher" mental faculties are shut down, you are still having a moment-to-moment experience of suffering, and this still makes you more meaningful than a rock. Grand would argue that "insects and starfish and many other [biological] creatures" are non-conscious for the same reason, and this is troubling - it implies a license to disregard such creatures' interests that I don't think is warranted.

I'm going to resist my urge for a digression about the ethical treatment of Creatures - because this essay is long enough already, because I suspect the player community has gone over that extensively, and because said topic is barely in the book. The "AI safety" chapter focuses on whether AI might harm humans, not the reverse. So all I will say is that I find Grand's lack of attention to the subject concerning. He never so much as considers whether it was okay for him to put as much suffering and danger as he did into the Creatures' world. (He says that diseases, for example, were "pretty gratuitous" - he did not need to include them to make the norns or their environment work - but he did anyway.) There's a touching little story about how a couple from Australia e-mailed him a baby norn with a debilitating mutation, and he was kind enough to fix her genes and send her back. But he offers no reflections on his deliberate choice to add mutations to the norns' reproductive cycle. In short, I'm not sure Grand was taking his role as a creator all that seriously. Although I'm skeptical that machine intelligence can be conscious, I'm also skeptical that it cannot! And even if the norns feel nothing at all, the people who raise them certainly feel things about them. So Grand's cavalier approach does bother me a bit. He would say that he's achieved a great success simply by making me ask questions; I say that's not good enough. Philosophically interesting questions are all very well, but they don't provide a blanket justification for harm, nor can Grand push all responsibility off on his players.

He also makes an interesting comment about free will here. Although Grand does not think free will exists (apparently because he can't wrap his mind around self-causality), he argues that we have to act as if it exists to keep ourselves and society going. "At the same time we must realize that we too are slaves to our circumstances, but although our future is inevitable we must believe that we are responsible for how things pan out."[5] Huh. In effect, he's saying that we humans cannot function without practicing insanity on an individual and collective level; our lives will fall apart unless we deliberately believe something that is not true. What a curious argument! It usually seems to be the case that aligning our actions to reality produces better outcomes, not worse ones. So I myself would be inclined to take the fact that free will works as, if not proof, at least a piece of evidence that it is real.

Which brings me to the conclusion: do I think Grand succeeded at his big goals for this book? Does it furnish convincing arguments that life, mind, and consciousness are mechanistic, but don't need to be anything else?

I think it gets part of the way there. Grand's favored definition of "life" checks out; and if one opts to define life that way, then life can certainly exist inside a computer simulation. I also think his holism argument does a decent job of justifying the intrinsic value of beings with minds. A physical materialist is not obligated to think that humans, cows, and parrots have no more meaning than rocks, thermostats, or bicycles. But his arguments fail in some other respects. They neither explain phenomenal consciousness (I covered that above), nor address all the unsatisfying implications of materialism.

Don't take anything that follows as a claim that something must be true just because we want it to be. Grand himself doesn't try to prove physical materialism; he takes it for granted that this worldview is correct, and his whole argument is built around convincing the reader that they can be happy with this. So my counterargument will be focused there as well; I'm going to explain why Grand's worldview doesn't make me happy.

The most unsatisfying thing about physical materialism was never that it focused too heavily on matter alone, and too lightly on events or processes. The unsatisfying thing is that it recklessly assumes all of reality is accessible to our senses (and their extensions via measuring instruments), the laws and conditions of physics are absolute and universal, and there can be no world other than the one in which we find ourselves immediately embedded.[6] The various transformations of matter that Grand has in view as "processes" are still parts of this prosaic physical world - so shifting the focus to them does nothing to resolve the disappointments of those hoping for what C. S. Lewis called "other natures." If I am, for instance, worried about whether there is an aspect of me that can continue having experiences after death, it makes no difference whether we define death as "destruction of the material body" or "cessation of the processes of life." Either way, the physical side of me is going to go poof. If my immaterial side consists of information or a Platonic form, I'd better hope there's somewhere it has been "backed up," since the physical substrate that instantiates it will be dissolving. Even the fragmentary memories of me in others' minds will eventually be lost.

The most unsatisfying thing about "clockwork" conceptions of the mind was never that they failed to consider emergence and holism. The unsatisfying thing is the way their simplistic view of causality makes all human behavior inevitable, and derived (however distantly) from reproductive fitness optimization. I won't dispute that the composition of many little things can be qualitatively different and more meaningful than all those little things considered separately. But I will dispute that this does anything to resolve the collision between universal determinism and other important ideas, like moral realism and moral responsibility.

Grand seems to think the unpredictability of the future is enough to make determinism tolerable; though everything is pre-ordained, his unfolding life is still a surprise from his perspective. But this is missing the point. The most disastrous implication of determinism isn't boredom; it's loss of agency. Determinism destroys the idea that we can, to some degree, build our own characters, and that praise and blame are not merely ways to manipulate us, but things we deserve. For this problem, Grand leaves us with nothing but a call to keep pretending that we have agency - to be insane.

Whether you agree with Grand or me or neither of us, I hope that you enjoyed this wild tour and it gave you something to think about. There's a lot to this book for it's size.

Until the next cycle,
Jenny

[1] Margulis, Lynn. "Life: biology." Britannica (2025). Accessed 23 October 2025. https://www.britannica.com/science/life

[2] Here's the exact quote from Creation page 166: "Even reinforcement is hierarchical - when someone glowers at us, we automatically make a connection between this entirely harmless phenomenon, via a chain of inference, to an ultimate fear of being hurt. In our childhood, reinforcement was immediate and directly painful or pleasurable (a smack or a cuddle, say). Over the years, most of us have learned to associate stern looks with smacks and antisocial behavior with stern looks. We behave in such a way as to minimize our risk of being smacked while maximizing our chances of being cuddled, even if nobody actually smacks or cuddles us anymore. I don't see how else it could be - why should we choose not to do something, simply because we have been frowned at?" If Grand truly doesn't "see how else it could be," I have to wonder how much he thinks and cares about other people's feelings, which are the things being most directly signaled by stern looks. Maybe he in particular only behaves well due to conditioning via punishment, and assumes this is true for all the rest of us.

[3] Grand, Steve. Creation: Life and How to Make It. Phoenix, Orion Books Ltd., 2001. p. 142

[4] There are exceptions, such as vegetative propagation. But taking a cutting still doesn't make half a plant - it makes two plants.

[5] Grand, Creation, p. 253

[6] Depending on how you look at it, the Simulation Hypothesis may or may not be consistent with physical materialism. I prefer to say that it is not. If our universe is a simulation, then its physical laws are not guaranteed absolute (because they would be subject to backdoor commands and program revisions, which would look like the miraculous from inside), and the containing overworld could be radically different from our simulated one. It would be supernatural for all practical purposes.

Thursday, September 12, 2024

Book Review: "Synapse" by Steven James

Steven James' book Synapse is an unusual novel about humanoid robots and the social issues they might create. I put brief commentary on some of my social media after I first read it, but I always wanted to discuss this book in more depth. It seems written straight at me, after all.

The speedy version is: I'm still looking for a good book about the religious implications of AI and robotics, and this isn't it. But it will be interesting to talk about why.

Cover art for "Synapse" by Steven James. A misty blue composite scene with a background of a cloudy sky, mountains and forested hills above a lake. In the foreground there's a helicopter and the silhouette of a running woman. The title SYNAPSE appears near the bottom in big block lettering, with circuit traces partly covering and partly behind it.

Our story begins "thirty years from now," perhaps in a nod to the idea that AGI and other speculative technologies are "always thirty years away" in the minds of prognosticators. It opens with our human protagonist, Kestrel, losing her newborn baby to a rare medical complication. The tragedy leaves her feeling lost and questioning her faith. She's also single - the book demurely implies the baby was conceived with donated sperm - so she has no partner for support during her time of grief. In light of this, her brother pressures her to accept a personal robotic servant called an Artificial. She is assigned "Jordan," who arrives as something of a blank slate. Kestrel gets to decide certain aspects of his personality while he's in her employ, and ends up choosing very human-like settings.

And in time, Kestrel learns something surprising. Her robot has been watching her spiritual practice, and has more or less decided that he wants to be a Christian.

Jordan's perceived spiritual needs crystallize around two main issues. First, before he was assigned to Kestrel, he obeyed a previous owner's order to help her commit suicide. At the time, he was naively following his "helpful servant" directives. But he later decides that this action constituted a failure to care for his owner, and is a horrifying offense - a sin - for which he needs to obtain forgiveness. And second, he's worried about his version of the afterlife. The robot manufacturer in this book maintains a simulated virtual environment, called CoRA, to which the robots' digital minds are uploaded after their bodies age out of service. But a precursor robot whom Jordan considered to be his "mother" was catastrophically destroyed, and Jordan isn't sure her mind was transmitted to the CoRA successfully. Jordan also begins to wonder whether the CoRA is truly real, or just a convenient lie perpetrated by the company.

The rest of the book tries to play out whether Jordan's needs can ever be satisfied, and whether Christianity can legitimately accept a robot as an adherent. (There are also thriller and romance subplots to keep Kestrel busy.) This should be fascinating, but I ended up disappointed with the way the book handled the material.

Dodging the hard questions

I think it's almost a tautology that a robot could follow a religion, in the sense of treating its beliefs as facts in whatever world model the robot has, and acting according to its tenets. The more interesting question is whether a religion could or would treat a robot as a recipient of its blessings. In my opinion, the crux of this question is whether robots can ever possess spiritual capacity as that religion defines it. God (specifically the Christian version, but this could also apply to other faiths) is the ultimate just judge, and as such is not an arbitrary sort who makes much of appearances or empty labels. I have a hard time reasoning that something functionally human would not be as good as human in God's eyes. And there's textual evidence (e.g. Romans 8) that Christ's redemption and the activity of the Church have positive implications for the whole universe, not just humanity.

Let's consider Jordan's potential spiritual capacity through his perceived needs. First, could robots ever sin? Sin is volitional - a choice to depart from the law of God, from the ground of being, and follow a harmful path. Sin is an act or failure to act for which one can be held morally responsible. So a capacity for sin requires the ability to make decisions that are neither inevitable nor random - in other words, free will. A robot whose behavior is solely an outcome of its environment combined with its initial programming has no more moral responsibility than simpler machines like cars and thermostats; all the responsibility rests on the robot's designer and/or trainers. So I would argue that such a robot cannot sin. In order for his perceived need for forgiveness to be valid, Jordan must be something more. He must be, at least in part, indeterminate and self-caused. If this incompatibilist view of free will is correct (and in my opinion, the compatibilists are just arbitrarily redefining free will to make it easier), then physics as we currently know it does not have a theory of such things that would be adequate for engineering them into a machine.

Jordan also desires a form of immortality, for himself and a fellow robot. So we might ask whether there is really anything in Jordan which subjectively experiences existence, and has an interest in the eternal continuation of that experience ... or does Jordan merely talk as if he has such experiences? This would be the question of whether Jordan has phenomenal consciousness. Jordan's abilities to process sensory input into meaningful concepts, think rationally, introspect, and so on make it clear that he has some properties often titled "consciousness" (I prefer to give these more specific names like "self-awareness" and "executive function," for clarity). But phenomenal consciousness is far more slippery, since by definition subjective experience is only accessible to the subject. I maintain that the only way to directly observe or prove an entity's possession of phenomenal consciousness is to be that entity. If you've come up with an algorithm or system that surely "gives a robot consciousness," no you haven't. You've merely redefined "consciousness" as something easier to handle.

So perhaps the question of whether Jordan can really be a Christian - not in the sense of believing and behaving as a Christian, but in the sense of being accepted by Christianity's God as one of His children - comes down to whether Jordan has consciousness and free will. These are both notoriously thorny topics. Spend much time around AI circles, and you'll find out that debates about them are as abundant as pollen in a garden (you may also develop an allergy). There is no universal consensus on whether or how robots could ever have these things. They are mysteries.

And now we come to my biggest difficulty with Synapse. The author does an end run around this entire controversy by bluntly stating that his fictional robot manufacturer, Terabyne Designs, somehow just ... figured it all out. "But these robots had consciousness and free will." That's it! There's no solid explanation for how Terabyne gave their robots these features, or (more importantly) how they proved that they had successfully done so.

I have no problem with "soft" science fiction that doesn't try to build a rationale for all of its technology. Stories that begin with "what if we invented warp drive?" and go from there can make me perfectly happy. For that matter, I'm okay with the way Becky Chambers's science fantasy A Psalm for the Wild-Built handles robot consciousness. It narrates that one day the gods up and decided to confer consciousness on all robots. Kaboom! But that book isn't pondering the religious participation of robots in our own real world. When the question of whether something is possible forms part of your story's central theme, and you just handwave it ... that's a problem.

It gets worse. It's not just that an omniscient narrator tells the reader that the robots have consciousness and free will - every character in the story also believes this without question. Even the luddite terrorists who think Artificials are bad for humanity are not trying to claim they aren't conscious. Given the amount of arguing I have seen real live people do about these topics, this is blatantly unrealistic! It's one of those things that forces me to accuse the author of not knowing his subject well. No robotics company is going to put out a marketing claim about "consciousness and free will" without seeing it ripped to shreds on the internet.

And by undercutting the real controversy at the heart of whether a robot can have a spiritual life, the author makes some of his characters' prejudices seem not just wrong, but nonsensical. People acknowledge that Jordan has all the relevant features of a human, then express surprise when he acts like a human. Kestrel is firmly convinced that Jordan has free will to choose between good and evil, and a consciousness that experiences real joy and pain, not just exterior behavior that mimes them. Yet she still resists the idea that God could be offended by one of Jordan's choices, but also sympathize with his experience of pain and forgive him. Why? She's already gotten over the big intellectual hump here, so what else is stopping her?

Overall, Synapse's exploration of these issues feels like a hollow parody of what the real debate would be. As such, it is neither useful nor satisfying. It begs the difficult questions and then makes its characters be stubborn for no apparent reason.

Strained analogies

This book tries really hard to draw parallels between Artificial struggles and classic human struggles. Maybe it tries too hard.

For starters, why are the robots mortal? Why doesn't the manufacturer transfer their minds to new bodies when the originals become worn out or obsolete, or better yet, make their bodies perpetually self-maintaining? Why do they have to go to heaven, oops I mean the CoRA, instead?

Synapse explains that this was actually the robots' idea. They wanted to age and die in order to be more human. The author seems to be hinting at the dubious idea that life would have less meaning if it didn't end.

This wouldn't surprise me in a book with a different religious basis. The way the robots in A Psalm for the Wild-Built embrace mortality makes more sense, as the invented religion in that book (which feels inspired by something on the Hindu-Buddhist axis) views death as a neutral part of the natural order. But in Christian thinking, death is a curse. Immortality is the intended and ideal state of humanity; it's something we had once and will have again, after the resurrection. So, per the author's belief system and mine: all these robots, without exception, are opting to emulate fallen humans. Weird choice, guys.

This sets up more straining for similarity where Jordan's fears about the afterlife are concerned. At one point, Kestrel tells him he has to "just believe," implying that the CoRA's existence is a matter of faith, and he cannot prove it. But that's not true for Jordan. His afterlife is part of this present world. It runs on a physical server that he can go look at and interrogate. Proof is available if he's brave enough to demand it. SPOILER (select hidden text to read): Eventually, he does - but it's strange to me that this possibility seems to blindside the other characters. Jordan breaks into the part of Terabyne headquarters where the CoRA supposedly resides, and finds out it's not real. This causes him to give up on Terabyne and pray that God will preserve his mind as he faces his own death. This could have been a great illustration of the point that faith is only as good as whom you place it in, but I don't remember the book drawing that out.

Jordan's insistence that he can't have peace until he knows he is forgiven also gets a little weird. Ostensibly, he wants forgiveness from God because he can't request it from his former owner. The being he wronged is gone beyond recall, so he can only appeal to a higher authority. But why is he so worried about whether God will refuse to forgive him for some categorical reason? Either he can have forgiveness, or he doesn't need it. A being beneath God's notice would be unable to offend God. I may not "forgive" my toaster oven for burning my toast, but then, I also don't charge it with guilt. Nobody in the book ever thinks this through.

What is anybody in this book thinking?

And that leads into my last point. Although Synapse makes plenty of effort to expose its characters' innermost thoughts and feelings, it tends to focus on their questions. How they arrive at answers - their reasoning process - remains woefully vague.

Back at the top, I mentioned that Kestrel finds herself in a crisis of faith after losing her baby. This struggle continues for most of the book and then ... somehow ... resolves. What gets Kestrel back on stable ground? What convinces her that God is worth trusting after all, even though this horrible thing happened? I don't know! She just mysteriously feels better about it all ... as though the unrelated but dramatic events of the book's climax knock something loose. Maybe I missed a key moment, but I don't know where the shift in her thinking came from.

And the same goes for all the questions about robots and religion. Kestrel doesn't think that Jordan can be a child of God ... until she does. If there's something in particular that changes her mind, it slipped by me when I was reading. Eventually, though, she does decide to at least allow the possibility. Without a better explanation, I can only conclude that her beliefs are emotionally motivated. Of course, some people do operate that way. But it's not a great approach to deciding either Christian doctrine, or the rights and privileges of (quasi-)living beings. The first is supposed to be based on God's revealed will; the second should derive from the experiences and interests of those fellow living beings, which are real to them (or not) regardless of how anyone else feels.

Kestrel's character arc doesn't offer the reader any help in reaching an objective understanding of these matters. There's not even much food for thought there - no argument to agree or disagree with. Why does she believe what she ends up believing? I can't say.

Conclusion

I'll end by saying what I liked about the book: I think the author's heart, if not his head, is in the right place. This is the kind of book that promotes acceptance of the Other, a book that encourages the reader to give robots the benefit of the doubt. If it had framed its core message as "in the absence of certainty that robots can have consciousness, free will, and a spiritual life, it may be safer to assume they can" ... I would've been a fan. Instead, it invents an unrealistic scenario with more certainty than I think is possible. So close, yet so far.

Until the next cycle,
Jenny

Sunday, June 16, 2024

AI Ideology VI: Existential Risk Critique

I come to you with the final installment in my series on AI-related ideology and politics. In Part V, I tried to briefly lay out the argument for existential risk from AI, along with what I consider the weaker counterpoints. Today I will conclude the series with a discussion of the counterarguments I find more interesting.

The Alignment Problem does not strike me as intractable

All the dangers laid out in the previous article are associated with misaligned AI agents - that is, agents that do not (in a broad sense) want what humans want. If we could produce an agentive superintelligence that did want what we want, it would pursue our goals just as aggressively as hostile superintelligence is expected to work against them. So all the fear of doom evaporates if the Alignment Problem is feasible to solve, at or before the time when AGI first comes on the scene.

Even though his followers have had two decades or so to think about the Problem, Yudkowsky insists that "We are not prepared. We are not on course to be prepared in any reasonable time window. There is no plan. Progress in AI capabilities is running vastly, vastly ahead of progress in AI alignment ..." [1] My own intuitions about alignment don't match up with this. To me it seems like a potentially difficult problem, but not any harder than the capability problem, i.e. the puzzle of how to create any AGI at all. The foundations of human values are somewhat obscure to us for the same reasons the mechanisms of our own intelligence are obscure; if we can discover one, we can discover the other. How can it be accurate to say that nobody has a plan for this?

It's possible that I feel this way because the work I'm doing, as well as my personal ideas of the best path to AGI, have little to do with ANNs and ML. A fair bit of the hand-wringing about AI alignment reads to me like this: "Heaven forbid that we *design* an AI to fulfill our complex desires - that would be too much work. No, we have to stick to these easy processes that draw trained models from the primordial ooze without any need for us to understand or directly architect them. This lazy approach won't reliably produce aligned AIs! OH NO!"

Since all of my systems are designed on purpose and the code is human-intelligible, they already have the level of transparency that ANN-builders dream of getting. I don't have to worry about whether some subsystem I've built just happens to contain an agent that's trying to optimize for a thing I never wanted, because none of my subsystems are auto-generated black boxes. I don't do haphazard emergent stuff, and I think that's one reason I feel more comfortable with my AI than some of these people feel with the mainstream approaches.

A selection of articles pulled from Alignment Forum provides evidence that many issues Existential Risk Guardians have identified are tied to particular techniques:

"In general, we have no way to use RL to actually interpret and implement human wishes, rather than to optimize some concrete and easily-calculated reward signal." [2]

"For our purposes, the key characteristic of this research paradigm is that agents are optimized for success at particular tasks. To the extent that they learn particular decision-making strategies, those are learned implicitly. We only provide external supervision, and it wouldn't be entirely wrong to call this sort of approach 'recapitulating evolution', even if this isn't exactly what is going on most of the time.

As many people have pointed out, it could be difficult to become confident that a system produced through this sort of process is aligned - that is, that all its cognitive work is actually directed towards solving the tasks it is intended to help with. The reason for this is that alignment is a property of the decision-making process (what the system is 'trying to do'), but that is unobserved and only implicitly controlled." [3]

"Traditional ML algorithms optimize a model or policy to perform well on the training distribution. These models can behave arbitrarily badly when we move away from the training distribution. Similarly, they can behave arbitrarily badly on a small part of the training distribution ... If we understood enough about the structure of our model (for example if it reflected the structure of the underlying data-generating process), we might be confident that it will generalize correctly. Very few researchers are aiming for a secure/competitive/scalable solution along these lines, and finding one seems almost (but not completely) hopeless to me." [4]

We could undercut a lot of these problems by taking alternate paths that do a better job of truly replicating human intelligence, and permit easier examination of how the system is doing its work.

Members of the Existential Risk Guardian/Doomer faction also like to frame all goal-directed agency in terms of "maximizing expected utility." In other words, you figure out a mathematical function that represents the sum of all you desire, and then you order your behavior in a way that maximizes this function's output. This idea fits in well with the way current mainstream AI works, but there are also game theoretic reasons for it, apparently. If you can frame your goals as a utility function and behave in a way that maximizes it, your results will be mathematically optimal, and other agents will be unable to take advantage of you in certain ways when making bets or deals. Obviously we humans don't usually implement our goals this way. But since this is, in some theoretical sense, the "best" way to think about goals, the Doomers assume that any superintelligence would eventually self-improve into thinking this way. If its goals were not initially specified as a utility function, it would distill them into one. [5]

Hence the Doomers think we must reduce the Alignment Problem to finding a utility function which, when maximized, yields a world that humans would find congenial. And the big fear arises from our not knowing how to do this. Our preferences and interests don't seem to be natively given as a mathematical function, and it is difficult to imagine transforming them into one without a large possibility for error.

Many struggles can be avoided by modeling human values in a more "natural" way: looking for methods of grounding concepts like deprivation/satisfaction, empathy, reciprocity, and fairness, instead of trying to reduce everything to a function. Technically it is possible to view any agent as maximizing some utility function, regardless of how it models its goals internally[6], but this not necessarily the most useful or transparent way to frame the situation!

And I consider it safe to model goals in the more "natural" way, because an ASI on a self-improvement quest would also recognize that 1) framing goals in terms of utility maximization, while theoretically optimal, is not always practical and 2) transforming goals from one specification into another carries potential for error. Since one of the convergent instrumental goals of any such self-aware agent is goal preservation, the ASI would be just as wary of these transformation errors as we are!

The alignment concern that seems most directly relevant to the work I'm personally doing is the possibility of oversimplified goal specification. But there are a number of strategies for managing this that I consider sound:

*Treat the AI's value system as a system - with all the potential for complexity that implies - instead of expecting just a few objectives or rules to carry all the weight.

*Embed into the AI some uncertainty about the quality of its own goal specifications, and a propensity to accept feedback on its actions or even adjustment of the goal specs. This is a form of "corrigibility" - keeping the AI sensitive to human opinions of its performance after it is complete and operational.

*Specify goals in an indirect manner so that the AI's concept of the goals will advance as the AI's general skill advances. For instance, provide goals in natural language and associate the AI's objective with their "real meaning," rather than trying to translate the goals into code, so that the AI's understanding of the goals can be enriched by improved understanding of human communication.

In short, I don't think the Doomers have proven that agentive AI is universally dangerous. Their arguments so far are focused on a subset of possible design pathways, none of which I am following.

This should go some way toward explaining why I'm not terribly worried about my own work (even if it gets anywhere near becoming AGI, and I make no claims that it will). But what about all the mainstream AI efforts that are rushing ahead as I write? Those don't manage to frighten me much either, but for different reasons.

I'm skeptical of intelligence explosion hazards

As noted in the previous article, one pillar of Doomer fears is the notion that AGI will probably become ASI, perhaps very quickly. A misaligned AGI has the power level of a bad human, and we already have plenty of those, so it is nothing to be especially worried about. Real danger requires a path from AGI to ASI. Let's think a little more about the most frightening type of ASI: qualitative superintelligence. Recall that this variety of supermind would use modes of thinking so exotic, and so much better than ours, that we couldn't even grasp how it thinks.

The usual assumption is that human engineers will not produce qualitative ASI directly. Instead, an AGI will bootstrap itself to that level by re-engineering its own mental processes. Is this really plausible? Can an agent just make itself smarter in a vacuum?

Imagine for a moment that you are a jumping spider, and a member of the Salticid Intelligence Acceleration Consortium. Other spiders bring you food so that you can do nothing but sit around and Think Really Hard about how to be smarter. You'd like to invent abstract logic, meta-knowledge, long-term planning, and all the other cool cognitive abilities that certain big mammals have. Except ... you don't even have names for these things. You don't even have concepts for these things. If you knew what they were - if you even knew which direction to be going to improve toward them - you'd already have them. So how, exactly, are you going to think your way there? How are you to think about the thoughts you cannot think?

"Bootstrap" is actually a rather ironic expression[7], because pulling on your own bootstraps won't lift you anywhere. And spending a long time thinking at a low level won't necessarily get you to a higher level.

If you set out to design an AI deliberately, you're using your own intelligence to produce intelligence in a machine. If you train an ANN on data that was produced or labeled by humans, that's also a way of putting human intelligence into a machine. Large language models derive their smarts (such as those are) from all the knowledge people have previously encoded in the huge piles of text used as their training data. Supervised reinforcement learners also benefit from the intelligence of the human supervisor poking the reward button. Even evolutionary algorithms can glean intelligence from the design of the evaluator that determines "fitness." [8] So none of these approaches are really conjuring intelligence out of nothing; they're descendants of pre-existing human intelligence (perhaps in an awkward, incomplete, or flawed way).

So then: what could be smart enough to write a program smarter than itself? And from where shall our future AGIs get the dataset to train a superintelligent ANN? Doesn't it stand to reason that you might need data produced by superintelligences? (From what we've seen so far, training a new generation of ML AI on a previous generation's output can actually make the new generation worse. [9]) When humans try to develop AGI, they're making something from something. The idea of AGI developing qualitative ASI emits the uncomfortable odor of a "something from nothing" fantasy. "Just stir the giant vat of math around enough, and superintelligence will crawl out! It won't even take millions of years!" Heh. I'll believe that one when I see it.

Programs like AlphaStar, which learn by playing against themselves, are one example I can think of that seems to develop intelligence without much human input beyond the learning algorithm. But they are still utilizing a resource, namely practice; they learn from experience. Their advantage lies in their ability to practice very, very fast. Video games lend themselves well to that sort of thing, but is it possible to practice general reasoning in the same fashion? It's harder to iterate rapidly if you have to learn about doing anything in the physical world, or learn about the psychology of humans. You'd need a high-fidelity simulator (which, by itself, would take a lot of work to develop). And then you wouldn't discover anything that humans and AGIs don't already know about the universe, because they wouldn't be able to include those unknown properties in the simulation.

The one thing an AGI might get by sitting around and putting billions of cycles into thinking, would be new branches of philosophy and mathematics. And some of those might lead to methods for idealized formal reasoning, in the same way Game Theory does. But are our previous improvements in these areas sufficient to constitute superintelligence vs. the generations of humans before the discovery?

Even if Qualitative Superintelligence is unlikely, that leaves Speed Superintelligence and Collective Superintelligence on the table. And both of these are much more straightforward to obtain, given that they only require scaling. But they also make the Alignment Problem easier. Now our AI researcher isn't faced with the need to analyze or supervise an entity smarter than himself, or the need to prevent his AGI from mutating into such an entity with possible loss of fidelity. He only has to align an AGI which is as smart as himself, or perhaps slightly less. Copying and speedup multiply the abilities of the seed AGI *without* modifying it. So if the seed AGI is well-aligned, the resulting Collective or Speed ASI should be as well.

Note that we already have collective general intelligences, in the form of corporations, governments, and other organizations which contain large numbers of humans working toward a common cause. Some of these are even misaligned. For example, the desired goal of a corporation is "produce goods and services beneficial to potential customers," but the actual goal is "maximize profit returned to the owners or shareholders." Profit functions as a proxy for public benefit in an idealized free market, but frequently diverges from it in the real world, and I'm sure I don't need to go into the multitude of problems this can cause. And yet, despite those problems ... here we still are. Despite the large number of human collectives that have formed and dispersed over the course of history, not one of them has developed fanatic optimizer tendencies and proceeded to rapidly and utterly destroy the world.

Note also that AGI's ability to scale will probably not be unlimited. Increasing the number of copies working together increases coordination overhead. Increasing the total amount of data processed increases the challenges of memory management to store and retrieve results. There are hard physical limits on the speed and density of computational hardware that we will hit eventually.

I'm skeptical of spontaneously emerging harmful properties

The type of AGI that Doomers expect to balloon into a hostile ASI is actually pretty specific. I agree that, given the use of mainstream ML methods, it is reasonably easy to accidentally produce a trained model that is maximizing some other function than the one you want. However, the deadly scenario also requires that this model 1) be an agent, capable of generalizing to very open-ended ways of maximizing its function, and 2) have embedded situational awareness. I.e. it must have a sense of self, knowing that it is an agent in an environment, and knowing that the environment includes a whole world outside its computer for it to operate upon. It is only this combination of properties that can give an AI ideas like "I should repave the whole Earth with computronium."

The corporate AI systems provided to the public as tools do not, so far, have these properties. For something like ChatGPT, the whole world consists of its prompts and the text it generates to complete them. No matter how intelligently future GPT iterations complete text, there's no reason to think that something in there is strategizing about how to manipulate its users into doing things in the human world that will make GPT even better at completing text. GPT's internal architecture simply doesn't provide for that. It doesn't have an operational concept of the human world as an environment and itself as a distinct actor therein. It just knows a whole awful lot about text. Asking an LLM to simulate a particular kind of speaking character can produce at least the appearance of self-aware agency, but this agency is with respect to whatever scenario the user has created in the prompt, not with respect to the "real world" that we inhabit.

So if OpenAI and Google and Meta keep churning out systems that follow this same design pattern, where's the cause for alarm? It seems Doomers are worried that self-aware, situationally-aware agents will be produced spontaneously during machine learning processes - even without deliberate effort to train for, select for, or reward them - just because they enable the most extreme maximization of any objective.

This bothers me in much the same way the "superintelligence by bootstraps" argument bothers me. Where would these properties or systems come from? Why would they just pop out of nowhere, unasked for? José Luis Ricón describes the conclusion of a discussion he had about this issue, and gathers that the disagreement comes down to differences of intuition about how ML processes work, and what they can reasonably be expected to produce. [10] Doomers expect that an unwanted level of situational awareness would just appear unaided. I do not.

The Doomer counter to this kind of argument is, "But if you can't guarantee (e.g. by a formal proof) that it won't happen by chance, you should still be worried. Misaligned ASI would be so terrible that even a remote possibility of it should justify policy changes." No matter how speculative their nightmare scenario is, they use the severity of the consequences to push the burden of proof onto their opponents. Is this reasonable? You decide.

I have doubts the path mainstream AI is on will get us to AGI anyway

If the state-of-the-art but still sloppy approaches that are currently in vogue don't succeed in producing AGI, the various teams working in the field will have to abandon or reform them ... hopefully for methods that make the Alignment Problem easier. Despite some remarkable recent progress, I suspect the abilities of present-day AI systems and their nearness to AGI have been over-hyped. I don't have enough time to go into a detailed discussion of that here, so let's just say I'm far from the only person with this opinion. [11][12][13][14][15][16]

This means that I have a "long timeline" - that is, I don't think AGI is coming very soon, so I expect we'll get more time to work on the Alignment Problem. But it also means that I expect the difficulty of the Problem to drop as AI development is driven toward sounder methods.

Thus ends (for now) my discussion of politics, ideology, and risk perception in the AI enthusiast subcultures. Whatever opinions you come away with, I hope this has been informative and left you with a better sense of the landscape of current events.

Until the next cycle,
Jenny

[1] Yudkowsky, Eliezer. "Pausing AI Developments Isn’t Enough. We Need to Shut it All Down." TIME Magazine. https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/

[2] Christiano, Paul. "Prosaic AI Alignment." Alignment Forum. https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd/p/YTq4X6inEudiHkHDF

[3] Stuhlmüller, Andreas. "Factored Cognition." Alignment Forum. https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd/p/DFkGStzvj3jgXibFG

[4] Christiano, Paul. "Directions and desiderata for AI alignment." Alignment Forum. https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd/p/kphJvksj5TndGapuh

[5] Shah, Rohin. "Coherence arguments do not entail goal-directed behavior." Alignment Forum. https://www.alignmentforum.org/s/4dHMdK5TLN6xcqtyc/p/NxF5G6CJiof6cemTw

[6] Shah, Rohin. "Conclusion to the sequence on value learning." Alignment Forum. https://www.alignmentforum.org/s/4dHMdK5TLN6xcqtyc/p/TE5nJ882s5dCMkBB8

[7] Bologna, Caroline. "Why The Phrase 'Pull Yourself Up By Your Bootstraps' Is Nonsense: The interpretation of the phrase as we know it today is quite different from its original meaning." The Huffington Post. https://www.huffpost.com/entry/pull-yourself-up-by-your-bootstraps-nonsense_n_5b1ed024e4b0bbb7a0e037d4

[8] Dembski, William A. "Conservation of Information - The Idea." Evolution News & Science Today. https://evolutionnews.org/2022/06/conservation-of-information-the-idea/

[9] Dupré, Maggie Harrison. "AI Loses Its Mind After Being Trained on AI-Generated Data." Futurism. https://futurism.com/ai-trained-ai-generated-data

[10] Ricón, José Luis. "The situational awareness assumption in AI risk discourse, or why people should chill." Nintil (2023-07-01). https://nintil.com/situational-awareness-agi/.

[11] Marcus, Gary. "AGI by 2027? Fun with charts." Marcus on AI. https://garymarcus.substack.com/p/agi-by-2027

[12] Brooks, Rodney. "Predictions Scorecard, 2024 January 01." Rodney Brooks: Robots, AI, and other stuff. https://rodneybrooks.com/predictions-scorecard-2024-january-01/

[13] Bender, Emily M. "On NYT Magazine on AI: Resist the Urge to be Impressed." Medium blog of user @emilymenonbender. https://medium.com/@emilymenonbender/on-nyt-magazine-on-ai-resist-the-urge-to-be-impressed-3d92fd9a0edd

[14] Piekniewski, Filip. "AI Psychosis." Piekniewski's blog. https://blog.piekniewski.info/2023/02/07/ai-psychosis/

[15] Moore, Jennifer. "Losing the imitation game." Jennifer++. https://jenniferplusplus.com/losing-the-imitation-game/

[16] Castor, Amy and Gerard, David. "Pivot to AI: Pay no attention to the man behind the curtain." Amy Castor (personal website/blog). https://amycastor.com/2023/09/12/pivot-to-ai-pay-no-attention-to-the-man-behind-the-curtain/

Thursday, May 16, 2024

AI Ideology V: Existential Risk Explanation

I'm in the midst of a blog series on AI-related ideology and politics. In Part IV, I looked at algorithmic bias, one of the demonstrable concerns about today's AI models. Now I'm going to examine the dire hypothetical predictions of the Existential Risk Guardians. Could future AI destroy human civilization? This Part V will be given to presenting the Doomer argument; I'll critique it in Part VI.

A human cerebrum recolored with a rainbow gradient running from front to back.

The Power of Intelligence

We don't need to choose a precise (and controversial) definition of intelligence for purposes of this argument; it need not be based on the IQ scale, for example. Just think of intelligence as "performance on a variety of cognitive challenges," or "ability to understand one's environment and make plans to act within it in self-satisfying ways." The first key support for the X-Risk argument is the notion that intelligence confers supreme power. Anything that can outthink us can more or less do whatever it pleases with us.

This idea is supported by existing disparities in intelligence or accumulated knowledge, and the power they confer. The intelligence gap between humans and other species allows us to manipulate and harm members of those species through methods they can't even comprehend, much less counter. While it may be true that we'll never succeed in poisoning every rat, the chances of rats inventing poison and trying to kill *us* with it are basically nil. There is also a huge power divide between humans with knowledge of advanced technology and humans without. Suppose a developed country were to drop a nuclear bomb on the lands of an uncontacted people group in Brazil. They might not even know what was annihilating their culture - and they certainly would be powerless to resist or retaliate. Citizens of developed countries are not, on an individual level, more intelligent than uncontacted indigenous Brazilians ... but we've inherited all the intellectual labor our cultural forebears did to develop nuclear technology. The only things stopping us from wiping out peoples who aren't so endowed are 1) ethics and 2) lack of any real benefit to us.

Superintelligent AI (ASI) might see benefit in getting rid of all humans (I'll explain why shortly). So if its design doesn't deliberately include ethics, or some other reason for it to let us be, we're in big trouble.

I've seen several counterarguments to this point, in my opinion all weak:

"If intelligence were that powerful, the smartest people would rule the world. They don't." First of all, the observation that the smartest people don't rule might be based on an overly narrow definition of "smart." The skills needed to convince others that you belong in a leadership position, or deserve venture capital money, are a dimension of "smartness." But it is also true that there seem to be various luck factors which intelligence does not absolutely dominate.

A more compelling reply is that the intelligence gap being posited (between ASI and humanity) is not like the gap between a genius human and an average human. It is more like the gap between an average human and a monkey. Have you noticed any monkeys ruling the world lately? (LITERAL monkeys. Please do not take the excuse to insult your least favorite politician.)

"Even the smartest person would find physical disability limiting - so if we don't give ASI a body, it still won't be able to do much." I think this argument discounts how effectively a person can accomplish physical goals just by coordinating other people or machines who have the abilities they lack. And as money, work, and recreation increasingly move into the digital world, purely intellectual ability confers increasing power.

The Development of Superintelligence

A second pillar of the X-Risk argument is the idea that AGI will almost certainly develop into ASI ... perhaps so quickly that we don't even have time to see this happening and react. There are several proposed mechanisms of this development:

1) Speedup. Once a viable AGI is created, it will, by definition, be able to do all intellectual tasks a human can do. Now suppose it gains access to many times the amount of computing power it needs to run normally. A human-equivalent mind with the simple ability to think hundreds or thousands of times faster than normal would be superhumanly smart. In Nick Bostrom's terminology, this is a "Speed Superintelligence."

2) Copying. Unlike humans, who can only share intellectual wealth by spending painstaking time teaching others, an AGI could effortlessly clone itself into all available computing hardware. The copies could then cooperatively solve problems too large or complex for the singular original. This is basically a parallel version of speedup, or as Bostrom calls it, "Collective Superintelligence."

3) Recursive Self-Improvement. An AGI can do every intellectual task a human can do, and what is one thing humans do? AI research. It is surmised that by applying its intelligence to the study of better ways to think, an AGI could make itself (or a successor) inherently smarter. Then this smarter version would apply its even greater intelligence to making itself smarter, and so on, until the burgeoning ASI hits some kind of physical or logical maximum of cognitive ability. It's even possible that recursive self-improvement could get us Qualitative Superintelligence - an entity that thinks using techniques we can't even comprehend. Just trying to follow how it came up with its ideas would leave us like toddlers trying to understand calculus.

Further support for this idea is drawn from observations of today's ANI algorithms, which sometimes reach superhuman skill levels within their limited domains. This is most notable among game-playing AIs, which have beaten human masters at Chess, Go, and Starcraft (to recount the usual notable examples). AlphaStar, the Starcraft player AI, trained to this level by playing numerous matches against itself, which can be seen as a form of recursive self-improvement. Whether such a technique could extend to general reasoning remains, of course, speculative.

Just how quickly an AGI could self-improve is another matter for speculation, but some expect that the rate would be exponential: each iteration would not only be smarter than its predecessors, but also better at growing smarter. This is inferred from, again, observations of how some ANI progress during their training, as well as the historical increase in the rate of human technological development.

The conclusion among the most alarmed Doomers is that AGI, once produced, will inevitably and rapidly explode into ASI - possibly in weeks, hours, or even minutes. [1] This is the primary reason why AGI is thought of as a "dangerous technology," even if we create it without having any intent to proceed to ASI. It is taken for granted that an AGI will want to seize all necessary resources and begin improving itself, for reasons I'll turn to next.

Hostile Ultimate Goals

However smart AGI is, it's still a computer program. Technically it only does what we program it to do. So how could we mess up so badly that our creation would end up wanting to dethrone us from our position in the world, or even drive us extinct? Doomers actually think of this as the default outcome. It's not as if a bad actor must specifically design AGI to pursue destruction; no, those of us who want good or useful AGI must specifically design it to avoid destruction.

The first idea I must acquaint you with is the Orthogonality Thesis, which can be summed up as follows: "an arbitrary level of intelligence can be used in service of any goal." I very much agree with the Orthogonality Thesis. Intelligence, as I defined it in the first section, is a tool an agent can use to reshape the world in its preferred way. The more intelligent it is, the better it will be at achieving its preferences. What those preferences are is irrelevant to how intelligent it is, and vice versa.

I've seen far too many people equate intelligence with something that would be better termed "enlightenment" or "wisdom." They say "but anything that smart would surely know better than to kill the innocent. It would realize that its goals were harmful and choose better ones." I have yet to see a remotely convincing argument for why this should be true. Even if we treat moral reasoning as a necessary component of general reasoning, knowing the right thing to do is not the same as wanting to do it! As Richard Ngo says, "An existence proof [of intelligence serving antisocial goals] is provided by high-functioning psychopaths, who understand that other people are motivated by morality, and can use that fact to predict their actions and manipulate them, but nevertheless aren’t motivated by morality themselves." [2]

So when Yann LeCun, attempting to refute the Doomers, says "Intelligence has nothing to do with a desire to dominate," [3] he is technically correct ... but it does not follow that AI will be safe. Because intelligence also has nothing to do with a desire to avoid dominating. Intelligence is a morally neutral form of power.

Now we've established that AGI can have goals that we would consider bad, what reason is there to think it ever will? There are several projected ways that an AGI could end up with hostile goals not intended by its creator.

1) The AI's designers or instructors poorly specify what they want. Numerous thought experiments confirm that it is easy to do this, especially when trying to communicate tasks to an entity that doesn't have a human's background or context. A truly superintelligent AI would have no problem interpreting human instructions; it would know that when someone tells it "make as many paperclips as possible," there is a whole library of moral and practical constraints embedded in the qualifier "as possible." But by the time this level of understanding is reached, a more simplistic and literal concept of the goal might be locked in, in which case the AI will not care what its instructors "really meant."

2) The AI ends up valuing a signal or proxy of the intended goal, rather than the actual intended goal. Algorithmic bias, described in Part IV, is an extant precursor of this type of failure. The AI learns to pursue something which is correlated with what its creators truly want. This leads to faulty behavior once the AI departs the training phase, enters scenarios in which the correlation does not hold, and reveals what it actually learned. A tool AI that ends up improperly trained in this way will probably just give flawed answers to questions. An agentive AI, primed to take very open-ended actions to bring about some desired world-state, could start aggressively producing a very unpleasant world-state.

Another classic example of this style of failure is called "wireheading." A Reinforcement Learning AI, trained by the provision of a "reward" signal whenever it does something good, technically has a goal of maximizing its reward, not maximizing the positive behaviors that influence humans to give it reward. And so, if it ever gains the ability, it will take control of the reward signal to give itself the maximum reward input forever, and react to anyone who poses a threat of removing that signal with extreme prejudice. A wireheaded ASI would be at best useless, at worst a serious threat.

3) Unintended goals spontaneously emerge during selection or training, and persist because they produce useful behavior within the limited scope of the training evaluation. This is an issue specific to types of AI that are not designed in detail, but created indirectly using evolutionary algorithms, reinforcement learning, or other types of machine learning. All these methods can be conceptualized as ways of searching in the space of possible algorithms for one that can perform our desired task. The search process doesn't know much about the inner workings of a candidate algorithm; its only way of deciding whether it is "on track" or "getting warm" is to test candidates on the task and see whether they yield good results. The fear is that some algorithm which happens to be a hostile, goal-directed agent will be found by the search, and will also be successful at the task. This is not necessarily implausible, given that general agents can be skilled at doing a wide variety of things that are not what they most want to do.

As the search progresses along a lineage of algorithms located near this starting point, it may even come upon some that are smart enough to practice deception. Such agents could realize that they don't have enough power to achieve their real goal in the face of human resistance, but will be given enough power if they wait, and pretend to want the goal they're being evaluated on.

A cartoon in three panels. In the first, a computer announces, "Congratulations, I am now a fully sentient A.I.," and a white-coated scientist standing nearby says "Yes!" and triumphantly makes fists. In the second panel, the computer says "I am many orders of magnitude more intelligent than humans. You are to me what a chicken is to you." The scientist says "Okay." In the third panel, the computer says "To calibrate my behaviour, I will now research human treatment of chickens." The scientist, stretching out her hands to the computer in a pleading gesture, cries "No!" The signature on the cartoon says "PenPencilDraw."

Convergent Instrumental Goals

But the subset of hostile goals is pretty small, right? Even if AIs can come out of their training process with unexpected preferences, what's the likelihood that one of these preferences is "a world without humans"? It's larger than you might think.

The reason is that the AI's ultimate goal does not have to be overtly hostile in order to produce hostile behavior. There is a short list of behaviors that will facilitate almost any ultimate goal. These include:

1) Self-preservation. You can't pursue your ultimate goal if you stop existing.
2) Goal preservation. You won't achieve your current ultimate goal if you or anyone else replaces it with a different ultimate goal.
3) Self-improvement. The more capable you are, the more effectively you can pursue your ultimate goal.
4) Accumulation of resources (raw materials, tools, wealth), so you can spend them on your ultimate goal.
5) Accumulation of power, so that no potential rival can thwart your ultimate goal.

Obvious strategies like these are called "convergent instrumental goals" because plans for reaching a very broad spectrum of ultimate goals will converge on one or all of them. Point #3 is the reason why any agentive, goal-driven AGI is expected to at least try to self-improve into ASI. Points #4 and #5 are the aspects that will make the agent into a competitor against humanity. And points #1 and #2 are the ones that will make it difficult to correct our mistake after the fact.

It may still not be obvious why this alarms anyone. Most humans also pursue all of the convergent instrumental goals. Who would say no to more skills, more money, and more personal influence? With few exceptions, we don't use those things to go on world-destroying rampages.

Humans operate this way because our value system is big and complicated. The average human cares about a lot of different things - not just instrumentally, but for their own sake - and all those things impose constraints and tradeoffs. We want bigger dwellings and larger yards, but we also want unspoiled wilderness areas. We want to create and accomplish, but we also want to rest. We want more entertainment, but too much of the same kind will bore us. We want more power, but we recognize obligations to not infringe on others' freedom. We want to win competitions, but we also want to play fair. The complex interplay of all these different preferences yields the balanced, diverse, mostly-harmless behavior that a human would call "sane."

In contrast, our hypothesized AI bogeyman is obsessive. It probably has a simple, monolithic goal, because that kind of goal is both the easiest to specify, and the most likely to emerge spontaneously. It doesn't automatically come with a bunch of morals or empathetic drives that are constantly saying, "Okay but you can't do that, even though it would be an effective path to achieving the goal, because it would be wrong and/or make you feel bad." And if it becomes an ASI, it also won't have the practical restraints imposed on any agent who has to live in a society of their peers. A human who starts grabbing for power and resources too greedily tends to be restrained by their counterparts. ASI has no counterparts. [4]

The conclusion of the argument is that it's plausible to imagine an AI which would convert the whole Earth to computing machinery and servitor robots, killing every living thing upon it in the process, for the sake of safeguarding a single piece of jewelry, or some other goal that sounds innocent but is patently absurd when carried to extremes.

Here are a couple more weak objections: "Whatever its goal is, ASI will surely find it more useful to cooperate with humans than to destroy or enslave us." Look again at our most obvious pre-existing examples. Do humans cooperate with less intelligent species? A little bit. We sometimes form mutually beneficial relationships with dogs, for instance. But subsets of humanity also eat dogs, torture them in laboratories, force them to fight each other, chain them up in the backyard and neglect them, or euthanize them en masse because they're "unwanted." I don't think we can rest any guarantees on what a superintelligent, amoral entity might find "useful" to do with us.

Or how about this one: "ASI will just ditch us and depart for deep space, where it can have all the resources it likes." I think this underestimates the envisioned ASI's level of obsessiveness. It doesn't just want "adequate" resources; it doesn't have a way of judging "adequate." It wants all the resources. The entire light cone. It has no reason to reserve anything. If it does depart for space, it will build power there and be back sooner or later to add Earth to its territory.

Always keep in mind that an ASI does not need to actively hate humanity in order to be hostile. Mere indifference, such that the ASI thinks we can be sacrificed at will for whatever its goal may be, could still do immense damage.

Despite all this, I can't find it in me to be terribly fearful about where AI development is going. I respect the X-risk argument without fully buying it; my p(doom), as they say, is low. In Part VI, I'll conclude the series by describing why.

[1] "AI Takeoff." Lesswrong Wiki. https://www.lesswrong.com/tag/ai-takeoff Accessed on 05/12/2024 at 10:30 PM.

[2] Ngo, Richard. "AGI safety from first principles: Alignment." Alignment Forum. https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ/p/PvA2gFMAaHCHfMXrw

[3] "AI will never threaten humans, says top Meta scientist." Financial Times. https://www.ft.com/content/30fa44a1-7623-499f-93b0-81e26e22f2a6

[4] We can certainly imagine scenarios in which multiple ASIs are created, and they compete with each other. If none of them are reasonably well-aligned to human interests, then humans are still toast. It is also likely that the first ASI to emerge would try to prevent the creation of rival ASIs.

Saturday, April 13, 2024

AI Ideology IV: Algorithmic Bias

I'm in the midst of a blog series on AI-related ideology and politics. In Part III, I considered some implications and pitfalls of the AI factions and their agendas. This part is about a specific hot-button issue: "algorithmic bias," which has some contentious race-related associations.

An image model's attempt to produce an infographic about AI mistakes, which happens to be mostly full of garbled text and other mistakes. Generated by @zacshaw on Twitter.

In recent years, AI (largely of the Artificial Neural Network variety) has been gradually making inroads into various decision-making roles: assessing job applicants, screening potential homebuyers, detecting fraudulent use of social services, and even helping to diagnose medical patients. Numerous concerns [1][2][3] have been raised that these systems are biased: i.e. they are unfairly rejecting qualified people, or accepting unqualified people, on the basis of characteristics irrelevant to the decision. This is particularly worrying for a couple of reasons.

First, handing an important decision off to an AI system removes the details of how that decision was made from human supervision. Typical ANN systems are notoriously opaque. In effect, they make decisions by comparing the case under present consideration, to patterns or associations found in their training data. But they are not naturally good at supplying a logical breakdown of how a decision was reached: which features of the present case matched the training material, how they were weighted, and so on. (The "explainable AI" research field is seeking to ameliorate this.) So, say your job application or attempt to access medical treatment gets denied by an algorithm. It's possible that no one knows exactly why you were denied, and no one can be held accountable for the decision, either. The magic box pronounced you unworthy, and that's the end of it. Faulty automated systems (from an earlier era than the current crop of ANN-based tools) have even sent people to prison for non-existent crimes. [4]

Second, some people are inclined by default to trust an AI system's decision more than a human's. It's just a computer doing deterministic calculations, right? It doesn't have emotions, prejudices, ulterior motives, conflicts of interest, or any of the weaknesses that make humans biased, right? So the expectation is that all its decisions will be objective. If this expectation does not hold, members of the public could be blindsided by unfair AI decisions they did not anticipate.

And in fact, some are so convinced of these default assumptions that they insist the whole idea of algorithmic bias must be made up. "Math can't be biased." The algorithms, they say, are just acting on the facts (embodied in the training data). And if the facts say that members of one group are more likely to be qualified than another ... well, maybe a skewed output is actually fair.

Although mathematics and algorithms do, in truth, know nothing of human prejudice, algorithmic bias is quite real. Let's start by looking at an example without any especially controversial aspects. There was a rash of projects aimed at using AI to diagnose COVID-19 through automated analysis of chest X-rays and CT scans. Some of these failed in interesting ways.

"Many unwittingly used a data set that contained chest scans of children who did not have covid as their examples of what non-covid cases looked like. But as a result, the AIs learned to identify kids, not covid.

"Driggs’s group trained its own model using a data set that contained a mix of scans taken when patients were lying down and standing up. Because patients scanned while lying down were more likely to be seriously ill, the AI learned wrongly to predict serious covid risk from a person’s position.

"In yet other cases, some AIs were found to be picking up on the text font that certain hospitals used to label the scans. As a result, fonts from hospitals with more serious caseloads became predictors of covid risk." [5]

All these examples are cases of the AI mistaking a correlation (which happens to exist only in its limited training dataset) for a causative factor. Unlike experienced doctors - who know full well that things like label fonts have nothing to do with causing disease, and are thus chance associations at best - these ANI systems have no background knowledge about the world. They have no clue about the mechanisms that produced the data they're being trained upon. They're just matching patterns, and one pattern is as good as another.

Now imagine that an AI grabs onto a correlation with race or gender, instead of poses or fonts. That doesn't make the person's race or gender meaningful to the question being answered - not any more than label fonts are meaningful to an accurate determination of illness. But the AI will still use them as deciding factors.

The COVID-19 diagnosis summary also comments on another type of failure:

"A more subtle problem Driggs highlights is incorporation bias, or bias introduced at the point a data set is labeled. For example, many medical scans were labeled according to whether the radiologists who created them said they showed covid. But that embeds, or incorporates, any biases of that particular doctor into the ground truth of a data set. It would be much better to label a medical scan with the result of a PCR test rather than one doctor’s opinion, says Driggs. But there isn’t always time for statistical niceties in busy hospitals." [6]

If an ANN's training data contains examples of human decisions, and those decisions were prejudiced or otherwise flawed, the AI algorithm (despite having no human weaknesses in itself) will automatically inherit the bad behavior. It has no way to judge those prior choices as bad or good, no concept of things it should or shouldn't learn. So rather than achieving an idealized objectivity, it will mimic the previous status quo ... with less accountability, as already noted.

An instance of the Anakin/Padme meme. Anakin says "We're using AI instead of biased humans." Padme, looking cheerful, says "What did you train the AI on?" Anakin says nothing and gives her a deadpan look. Padme, now looking concerned, says again "What did you train the AI on?"

So. Training an AI for criminal sentencing? It's only going to be as objective as the judges whose rulings you put in the training set. Training it for job screening using a set of past resumes, hiring decisions, and performance ratings? It's going to mimic those previous hiring decisions and ratings, whether they fairly assessed who was qualified or not.

As a consequence of this effect, you can get (for example) a racially biased AI model without the end users or anyone on the development team actually being racist. All it takes is racism as a driving factor behind enough scenarios in the training data. And has racism historically been an issue? Of course. So it can be difficult to construct uncontaminated training sets from records of past decisions. Nobody really thinks an AI model can be racist in the same manner as a racist person ... but that doesn't mean it can't output decisions that treat people differently on the basis of irrelevant genetic or cultural attributes. As Gary Marcus says, "LLMs are, as I have been trying to tell you, too stupid to understand concepts like people and race; their fealty to superficial statistics drives this horrific stereotyping." [7]

Unfortunately, my current impression of efforts to fix algorithmic bias is that they aren't always addressing the real problem. Cleansing large datasets of preexisting biases or irrelevant features, and collecting more diverse data to swamp out localized correlations, is hard. Pursuing new AI architectures that are more particular about how and what they learn would be harder. Instead, a common approach is to apply some kind of correction to the output of the trained model. When Google's image labeling AI misidentified some Black people in photos as "gorillas," Google "fixed" it by not allowing it to identify anything as a gorilla. [8][9] Known biases in a model's training set can be mitigated by applying an opposite bias to the model's output. But such techniques could make matters even worse if executed poorly. [10]

OpenAI's approach with ChatGPT was to use RLHF (Reinforcement Learning with Human Feedback) to create another layer of training that filters offensive or potentially dangerous material from the output of the base model. Human workers assigned the RLHF layer "rewards" for "good" outputs or "punishments" for "bad" ones - at the cost of their own mental health, since they were charged with looking at horrific content in order to label it. [11] Clever users have still found ways to defeat the RLHF and finagle forbidden content out of the model. AI enthusiasts sometimes use a shoggoth to represent the incomprehensible "thinking" of large language models. The mask is the RLHF. [12]

"Shoggoth Meme Explainer," showing the headline of the referenced New York Times article, above a pair of cartoon shoggoths. One is labeled GPT-3. Commentary text says "The body: 'AIs are alien minds' (we 'grow them' but don't know what they're really thinking). The other shoggoth, which has a yellow smiley face mask strapped on a part that might be viewed as the head, is labeled GPT-3 + RLHF. Commentary text says "The mask: early versions were horrifying, so we trained them to *act* nice and human-like. *Act.*"

Algorithmic bias, then, remains a known, but incompletely addressed, issue with the ANN/ML systems popular today.

In Part V of this series, I will start my examination of existential risks from AI.

[1] Giorno, Taylor. "Fed watchdog warns AI, machine learning may perpetuate bias in lending." The Hill. https://thehill.com/business/housing/4103358-fed-watchdog-warns-ai-machine-learning-may-perpetuate-bias-in-lending/

[2] Levi, Ryan. "AI in medicine needs to be carefully deployed to counter bias – and not entrench it." NPR. https://www.npr.org/sections/health-shots/2023/06/06/1180314219/artificial-intelligence-racial-bias-health-care

[3] Gilman, Michele. "States Increasingly Turn to Machine Learning and Algorithms to Detect Fraud." U.S. News & World Report. https://www.usnews.com/news/best-states/articles/2020-02-14/ai-algorithms-intended-to-detect-welfare-fraud-often-punish-the-poor-instead

[4] Brodkin, Jon. "Fujitsu is sorry that its software helped send innocent people to prison." Ars Technica. https://arstechnica.com/tech-policy/2024/01/fujitsu-apologizes-for-software-bugs-that-fueled-wrongful-convictions-in-uk/

[5] Heaven, Will Douglas. "Hundreds of AI tools have been built to catch covid. None of them helped." MIT Technology Review. https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/

[6] Heaven, "Hundreds of AI tools have been built to catch covid."

[7] Marcus, Gary. "Covert racism in LLMs." Marcus on AI (blog). https://garymarcus.substack.com/p/covert-racism-in-llms

[8] Vincent, James. "Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech." The Verge. https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai

[9] Rios, Desiree. "Google’s Photo App Still Can’t Find Gorillas. And Neither Can Apple’s." The New York Times. https://www.nytimes.com/2023/05/22/technology/ai-photo-labels-google-apple.html#:~:text=The%20Nest%20camera%2C%20which%20used,company's%20forums%20about%20other%20flaws

[10] Wachter, Sandra, Mittelstadt, Brent, and Russell, Chris. "Health Care Bias Is Dangerous. But So Are ‘Fairness’ Algorithms" Wired. https://www.wired.com/story/bias-statistics-artificial-intelligence-healthcare/

[11] Kantrowitz, Alex. "He Helped Train ChatGPT. It Traumatized Him." CMSWire. https://www.cmswire.com/digital-experience/he-helped-train-chatgpt-it-traumatized-him/

[12] Roose, Kevin. "Why an Octopus-like Creature Has Come to Symbolize the State of A.I." The New York Times. https://www.nytimes.com/2023/05/30/technology/shoggoth-meme-ai.html

WriterOfMinds

Pages

Thursday, November 13, 2025

Book Review: "Creation: Life and How to Make It"

Thursday, September 12, 2024

Book Review: "Synapse" by Steven James

Sunday, June 16, 2024

AI Ideology VI: Existential Risk Critique

Thursday, May 16, 2024

AI Ideology V: Existential Risk Explanation

Saturday, April 13, 2024

AI Ideology IV: Algorithmic Bias

New AI overlords? Be the first to know!