WriterOfMinds: Other People's AI

Showing posts with label Other People's AI. Show all posts

Thursday, January 9, 2025

Book Review: "Growing Up with Lucy"

This book by Steve Grand has been on my list for a while. It's his personal account of how he built a robot at home, with every goal of rivaling some of the most sophisticated research efforts at the time. Steve Grand, by the way, is the mastermind behind the popular Creatures series of artificial life simulation video games. His books were recommended to me several years ago by Twitter buddy Artem (@artydea), whom I unfortunately haven't seen around much recently. Thanks, Artem, if you're still out there.

Lucy, without her fur on. Photo credit: Creatures wikia.

The magnum opus described in this book is the eponymous Lucy, who resembles an orangutan ... or at least the upper half of one. Lucy has no legs, but does have actuated arms, visual and auditory sensors, and a mouth and speech system. Grand aimed for a holistic approach (rather than focusing on just one function, such as vision) because he wanted to investigate the interactions and possible commonalities among the various sensory modalities and motor systems.

Once Grand started introducing himself, I quickly realized that I was hearing from someone more or less like me: an educated-layman tinkerer, working independently. Grand produced Lucy on his own initiative, without the backing of a university or corporation, though toward the end of the project he received a grant from NESTA (the UK's National Endowment for Science, Technology, and the Arts). But the substance of Grand's work diverges from mine. He put heavy investment into all the sensorimotor aspects of intelligence that I'm completely leaving out of Acuitas. And Lucy's control systems, while not explicitly biomimetic, are very brain-inspired; whereas I draw inspiration from the human mind as an abstraction, and don't care much about our wetware at all. So it was fun to read an account of somebody who was working on roughly the same level as myself, but with different strategies in a different part of the problem space.

On the other hand, I was disappointed by how much of the book was theory rather than results. I got the most enjoyment out of the parts in which Grand described how he made a design decision or solved a problem. His speculations about what certain neuroscience findings might mean, or how he *could* add more functionality in the future, were less interesting to me ... because speculations are a dime a dozen, especially in this field. Now I do want to give Grand a lot of credit: he actually built a robot! That's farther than a lot of pontificators seem to get. But the book is frank about Lucy being very much unfinished at time of publication. Have a look around this blog. If there's a market for writeups of ambitious but incomplete projects, then where's *my* book deal?

In the book, Grand said that Lucy was nearly capable of learning from experience to recognize a banana and point at it with one of her arms. It sounded like she had all the enabling features for this, but wasn't doing it *reliably* yet. I did a little internet browsing to see how much farther Lucy got after the book went to press. From what I could find, her greatest accomplishment was learning the visual difference between bananas and apples, and maybe demonstrating her knowledge by pointing. [1] That's nothing to sneeze at, trust me. But it's a long way from what Grand's ambitions for Lucy seemed to be, and in particular, it leaves his ideas about higher reasoning and language untested. Apparently he did not just get these things "for free" after figuring out some rudimentary sensorimotor intelligence. Grand ceased work on Lucy in 2006, and she is now in the care of the Science Museum Group. [2]

Why did he stop? He ran out of money. Grand worked on Lucy full-time while living off his savings. The book's epilogue describes how NESTA came through just in time to allow the project to continue. Once the grant was expended, Lucy was set aside in favor of paying work. I doubt I can compete with Grand's speed of progress by playing with AI on the side while holding down a full-time job ... but I might have the advantage of sustainability. Grand started in 2001 and gave Lucy about five years. If you don't count the first two rudimentary versions, Acuitas is going on eight.

Grand identifies not neurons, but the repeating groups of neurons in the cortex, as the "fundamental unit" of general intelligence and the ideal level at which to model a brain. He doesn't use the term "cortical column," but I assume that's what he's referring to. Each group contains the same selection of neurons, but the wiring between them is variable and apparently "programmed" by experiential learning, prompting Grand to compare the groups with PLDs (the forerunners of modern FPGAs). He conceptualizes intelligence as a hierarchy of feedback control loops, an idea I've also seen expounded by Filip Piekniewski. [3] It's a framing I rather like, but I still want to be cautious about hanging all of intelligence on a single concept or method. I don't think any lone idea will get you all the way there (just as this one did not get Grand all the way there).

Lucy's body is actuated by electric motors, with linkages that help them behave more like "muscles." Grand didn't try pneumatics or hydraulics, because he thought they would be too difficult to control. I guess we'll see, eh?

Two chapters at the book's tail end move from technical affairs into philosophy. The first addresses safety concerns and fears of killer robots. While I agree with his basic conclusion that AI is not inevitably dangerous, I found his arguments dated and simplistic. I doubt they would convince anybody acquainted with recent existential risk discourse, which probably wasn't in the public consciousness when Grand was building Lucy. (LessWrong.com was launched in 2009; Yudkowski's "Harry Potter and the Methods of Rationality," Scott Alexander's "Slate Star Codex" blog, and Bostrom's Superintelligence all came later. See my "AI Ideology" article series for more about that business.)

The final chapter is for what I'll call the "slippery stuff": consciousness and free will. Grand avoids what I consider the worst offenses AI specialists commit on these topics. He admits that he doesn't really know what consciousness is or what produces it, instead of advancing some untestable personal theory as if it were a certainty. And he doesn't try to make free will easier by redefining it as something that isn't free at all. But I thought his discussion was, once again, kind of shallow. The only firm position he takes on consciousness is to oppose panpsychism, on the grounds that it doesn't really explain anything: positing that consciousness pervades the whole universe gets us no farther toward understanding what's special about living brains. (I agree with him, but there's a lot more to the consciousness discussion.) And he dismisses free will as a logical impossibility, because he apparently can't imagine a third thing that is neither random nor feed-forward deterministic. He doesn't consider that his own imagination might be limited, or dig into the philosophical literature on the topic; he just challenges readers to define self-causation in terms of something else. (But it's normal for certain things to be difficult to define in terms of other things. Some realities are just fundamental.) It's one chapter trying to cover questions that could fill a whole book, so maybe I shouldn't have expected much.

On the whole, it was interesting to study the path walked by a fellow hobbyist and see what he accomplished - and what he didn't. I wonder whether I'll do as well.

Until the next cycle,
Jenny

[1] Dermody, Nick. "A Grand plan for brainy robots." BBC News Online Wales (2004). http://news.bbc.co.uk/2/hi/uk_news/wales/3521852.stm

[2] Science Museum Group. "'Lucy' robot developed by Steve Grand." 2015-477 Science Museum Group Collection Online. Accessed 2 January 2025. https://collection.sciencemuseumgroup.org.uk/objects/co8465358/lucy-robot-developed-by-steve-grand.

[3] Piekniewski, Filip. "The Atom of Intelligence." Piekniewski's Blog (2023). https://blog.piekniewski.info/2023/04/16/the-atom-of-intelligence/

Sunday, June 16, 2024

AI Ideology VI: Existential Risk Critique

I come to you with the final installment in my series on AI-related ideology and politics. In Part V, I tried to briefly lay out the argument for existential risk from AI, along with what I consider the weaker counterpoints. Today I will conclude the series with a discussion of the counterarguments I find more interesting.

The Alignment Problem does not strike me as intractable

All the dangers laid out in the previous article are associated with misaligned AI agents - that is, agents that do not (in a broad sense) want what humans want. If we could produce an agentive superintelligence that did want what we want, it would pursue our goals just as aggressively as hostile superintelligence is expected to work against them. So all the fear of doom evaporates if the Alignment Problem is feasible to solve, at or before the time when AGI first comes on the scene.

Even though his followers have had two decades or so to think about the Problem, Yudkowsky insists that "We are not prepared. We are not on course to be prepared in any reasonable time window. There is no plan. Progress in AI capabilities is running vastly, vastly ahead of progress in AI alignment ..." [1] My own intuitions about alignment don't match up with this. To me it seems like a potentially difficult problem, but not any harder than the capability problem, i.e. the puzzle of how to create any AGI at all. The foundations of human values are somewhat obscure to us for the same reasons the mechanisms of our own intelligence are obscure; if we can discover one, we can discover the other. How can it be accurate to say that nobody has a plan for this?

It's possible that I feel this way because the work I'm doing, as well as my personal ideas of the best path to AGI, have little to do with ANNs and ML. A fair bit of the hand-wringing about AI alignment reads to me like this: "Heaven forbid that we *design* an AI to fulfill our complex desires - that would be too much work. No, we have to stick to these easy processes that draw trained models from the primordial ooze without any need for us to understand or directly architect them. This lazy approach won't reliably produce aligned AIs! OH NO!"

Since all of my systems are designed on purpose and the code is human-intelligible, they already have the level of transparency that ANN-builders dream of getting. I don't have to worry about whether some subsystem I've built just happens to contain an agent that's trying to optimize for a thing I never wanted, because none of my subsystems are auto-generated black boxes. I don't do haphazard emergent stuff, and I think that's one reason I feel more comfortable with my AI than some of these people feel with the mainstream approaches.

A selection of articles pulled from Alignment Forum provides evidence that many issues Existential Risk Guardians have identified are tied to particular techniques:

"In general, we have no way to use RL to actually interpret and implement human wishes, rather than to optimize some concrete and easily-calculated reward signal." [2]

"For our purposes, the key characteristic of this research paradigm is that agents are optimized for success at particular tasks. To the extent that they learn particular decision-making strategies, those are learned implicitly. We only provide external supervision, and it wouldn't be entirely wrong to call this sort of approach 'recapitulating evolution', even if this isn't exactly what is going on most of the time.

As many people have pointed out, it could be difficult to become confident that a system produced through this sort of process is aligned - that is, that all its cognitive work is actually directed towards solving the tasks it is intended to help with. The reason for this is that alignment is a property of the decision-making process (what the system is 'trying to do'), but that is unobserved and only implicitly controlled." [3]

"Traditional ML algorithms optimize a model or policy to perform well on the training distribution. These models can behave arbitrarily badly when we move away from the training distribution. Similarly, they can behave arbitrarily badly on a small part of the training distribution ... If we understood enough about the structure of our model (for example if it reflected the structure of the underlying data-generating process), we might be confident that it will generalize correctly. Very few researchers are aiming for a secure/competitive/scalable solution along these lines, and finding one seems almost (but not completely) hopeless to me." [4]

We could undercut a lot of these problems by taking alternate paths that do a better job of truly replicating human intelligence, and permit easier examination of how the system is doing its work.

Members of the Existential Risk Guardian/Doomer faction also like to frame all goal-directed agency in terms of "maximizing expected utility." In other words, you figure out a mathematical function that represents the sum of all you desire, and then you order your behavior in a way that maximizes this function's output. This idea fits in well with the way current mainstream AI works, but there are also game theoretic reasons for it, apparently. If you can frame your goals as a utility function and behave in a way that maximizes it, your results will be mathematically optimal, and other agents will be unable to take advantage of you in certain ways when making bets or deals. Obviously we humans don't usually implement our goals this way. But since this is, in some theoretical sense, the "best" way to think about goals, the Doomers assume that any superintelligence would eventually self-improve into thinking this way. If its goals were not initially specified as a utility function, it would distill them into one. [5]

Hence the Doomers think we must reduce the Alignment Problem to finding a utility function which, when maximized, yields a world that humans would find congenial. And the big fear arises from our not knowing how to do this. Our preferences and interests don't seem to be natively given as a mathematical function, and it is difficult to imagine transforming them into one without a large possibility for error.

Many struggles can be avoided by modeling human values in a more "natural" way: looking for methods of grounding concepts like deprivation/satisfaction, empathy, reciprocity, and fairness, instead of trying to reduce everything to a function. Technically it is possible to view any agent as maximizing some utility function, regardless of how it models its goals internally[6], but this not necessarily the most useful or transparent way to frame the situation!

And I consider it safe to model goals in the more "natural" way, because an ASI on a self-improvement quest would also recognize that 1) framing goals in terms of utility maximization, while theoretically optimal, is not always practical and 2) transforming goals from one specification into another carries potential for error. Since one of the convergent instrumental goals of any such self-aware agent is goal preservation, the ASI would be just as wary of these transformation errors as we are!

The alignment concern that seems most directly relevant to the work I'm personally doing is the possibility of oversimplified goal specification. But there are a number of strategies for managing this that I consider sound:

*Treat the AI's value system as a system - with all the potential for complexity that implies - instead of expecting just a few objectives or rules to carry all the weight.

*Embed into the AI some uncertainty about the quality of its own goal specifications, and a propensity to accept feedback on its actions or even adjustment of the goal specs. This is a form of "corrigibility" - keeping the AI sensitive to human opinions of its performance after it is complete and operational.

*Specify goals in an indirect manner so that the AI's concept of the goals will advance as the AI's general skill advances. For instance, provide goals in natural language and associate the AI's objective with their "real meaning," rather than trying to translate the goals into code, so that the AI's understanding of the goals can be enriched by improved understanding of human communication.

In short, I don't think the Doomers have proven that agentive AI is universally dangerous. Their arguments so far are focused on a subset of possible design pathways, none of which I am following.

This should go some way toward explaining why I'm not terribly worried about my own work (even if it gets anywhere near becoming AGI, and I make no claims that it will). But what about all the mainstream AI efforts that are rushing ahead as I write? Those don't manage to frighten me much either, but for different reasons.

I'm skeptical of intelligence explosion hazards

As noted in the previous article, one pillar of Doomer fears is the notion that AGI will probably become ASI, perhaps very quickly. A misaligned AGI has the power level of a bad human, and we already have plenty of those, so it is nothing to be especially worried about. Real danger requires a path from AGI to ASI. Let's think a little more about the most frightening type of ASI: qualitative superintelligence. Recall that this variety of supermind would use modes of thinking so exotic, and so much better than ours, that we couldn't even grasp how it thinks.

The usual assumption is that human engineers will not produce qualitative ASI directly. Instead, an AGI will bootstrap itself to that level by re-engineering its own mental processes. Is this really plausible? Can an agent just make itself smarter in a vacuum?

Imagine for a moment that you are a jumping spider, and a member of the Salticid Intelligence Acceleration Consortium. Other spiders bring you food so that you can do nothing but sit around and Think Really Hard about how to be smarter. You'd like to invent abstract logic, meta-knowledge, long-term planning, and all the other cool cognitive abilities that certain big mammals have. Except ... you don't even have names for these things. You don't even have concepts for these things. If you knew what they were - if you even knew which direction to be going to improve toward them - you'd already have them. So how, exactly, are you going to think your way there? How are you to think about the thoughts you cannot think?

"Bootstrap" is actually a rather ironic expression[7], because pulling on your own bootstraps won't lift you anywhere. And spending a long time thinking at a low level won't necessarily get you to a higher level.

If you set out to design an AI deliberately, you're using your own intelligence to produce intelligence in a machine. If you train an ANN on data that was produced or labeled by humans, that's also a way of putting human intelligence into a machine. Large language models derive their smarts (such as those are) from all the knowledge people have previously encoded in the huge piles of text used as their training data. Supervised reinforcement learners also benefit from the intelligence of the human supervisor poking the reward button. Even evolutionary algorithms can glean intelligence from the design of the evaluator that determines "fitness." [8] So none of these approaches are really conjuring intelligence out of nothing; they're descendants of pre-existing human intelligence (perhaps in an awkward, incomplete, or flawed way).

So then: what could be smart enough to write a program smarter than itself? And from where shall our future AGIs get the dataset to train a superintelligent ANN? Doesn't it stand to reason that you might need data produced by superintelligences? (From what we've seen so far, training a new generation of ML AI on a previous generation's output can actually make the new generation worse. [9]) When humans try to develop AGI, they're making something from something. The idea of AGI developing qualitative ASI emits the uncomfortable odor of a "something from nothing" fantasy. "Just stir the giant vat of math around enough, and superintelligence will crawl out! It won't even take millions of years!" Heh. I'll believe that one when I see it.

Programs like AlphaStar, which learn by playing against themselves, are one example I can think of that seems to develop intelligence without much human input beyond the learning algorithm. But they are still utilizing a resource, namely practice; they learn from experience. Their advantage lies in their ability to practice very, very fast. Video games lend themselves well to that sort of thing, but is it possible to practice general reasoning in the same fashion? It's harder to iterate rapidly if you have to learn about doing anything in the physical world, or learn about the psychology of humans. You'd need a high-fidelity simulator (which, by itself, would take a lot of work to develop). And then you wouldn't discover anything that humans and AGIs don't already know about the universe, because they wouldn't be able to include those unknown properties in the simulation.

The one thing an AGI might get by sitting around and putting billions of cycles into thinking, would be new branches of philosophy and mathematics. And some of those might lead to methods for idealized formal reasoning, in the same way Game Theory does. But are our previous improvements in these areas sufficient to constitute superintelligence vs. the generations of humans before the discovery?

Even if Qualitative Superintelligence is unlikely, that leaves Speed Superintelligence and Collective Superintelligence on the table. And both of these are much more straightforward to obtain, given that they only require scaling. But they also make the Alignment Problem easier. Now our AI researcher isn't faced with the need to analyze or supervise an entity smarter than himself, or the need to prevent his AGI from mutating into such an entity with possible loss of fidelity. He only has to align an AGI which is as smart as himself, or perhaps slightly less. Copying and speedup multiply the abilities of the seed AGI *without* modifying it. So if the seed AGI is well-aligned, the resulting Collective or Speed ASI should be as well.

Note that we already have collective general intelligences, in the form of corporations, governments, and other organizations which contain large numbers of humans working toward a common cause. Some of these are even misaligned. For example, the desired goal of a corporation is "produce goods and services beneficial to potential customers," but the actual goal is "maximize profit returned to the owners or shareholders." Profit functions as a proxy for public benefit in an idealized free market, but frequently diverges from it in the real world, and I'm sure I don't need to go into the multitude of problems this can cause. And yet, despite those problems ... here we still are. Despite the large number of human collectives that have formed and dispersed over the course of history, not one of them has developed fanatic optimizer tendencies and proceeded to rapidly and utterly destroy the world.

Note also that AGI's ability to scale will probably not be unlimited. Increasing the number of copies working together increases coordination overhead. Increasing the total amount of data processed increases the challenges of memory management to store and retrieve results. There are hard physical limits on the speed and density of computational hardware that we will hit eventually.

I'm skeptical of spontaneously emerging harmful properties

The type of AGI that Doomers expect to balloon into a hostile ASI is actually pretty specific. I agree that, given the use of mainstream ML methods, it is reasonably easy to accidentally produce a trained model that is maximizing some other function than the one you want. However, the deadly scenario also requires that this model 1) be an agent, capable of generalizing to very open-ended ways of maximizing its function, and 2) have embedded situational awareness. I.e. it must have a sense of self, knowing that it is an agent in an environment, and knowing that the environment includes a whole world outside its computer for it to operate upon. It is only this combination of properties that can give an AI ideas like "I should repave the whole Earth with computronium."

The corporate AI systems provided to the public as tools do not, so far, have these properties. For something like ChatGPT, the whole world consists of its prompts and the text it generates to complete them. No matter how intelligently future GPT iterations complete text, there's no reason to think that something in there is strategizing about how to manipulate its users into doing things in the human world that will make GPT even better at completing text. GPT's internal architecture simply doesn't provide for that. It doesn't have an operational concept of the human world as an environment and itself as a distinct actor therein. It just knows a whole awful lot about text. Asking an LLM to simulate a particular kind of speaking character can produce at least the appearance of self-aware agency, but this agency is with respect to whatever scenario the user has created in the prompt, not with respect to the "real world" that we inhabit.

So if OpenAI and Google and Meta keep churning out systems that follow this same design pattern, where's the cause for alarm? It seems Doomers are worried that self-aware, situationally-aware agents will be produced spontaneously during machine learning processes - even without deliberate effort to train for, select for, or reward them - just because they enable the most extreme maximization of any objective.

This bothers me in much the same way the "superintelligence by bootstraps" argument bothers me. Where would these properties or systems come from? Why would they just pop out of nowhere, unasked for? José Luis Ricón describes the conclusion of a discussion he had about this issue, and gathers that the disagreement comes down to differences of intuition about how ML processes work, and what they can reasonably be expected to produce. [10] Doomers expect that an unwanted level of situational awareness would just appear unaided. I do not.

The Doomer counter to this kind of argument is, "But if you can't guarantee (e.g. by a formal proof) that it won't happen by chance, you should still be worried. Misaligned ASI would be so terrible that even a remote possibility of it should justify policy changes." No matter how speculative their nightmare scenario is, they use the severity of the consequences to push the burden of proof onto their opponents. Is this reasonable? You decide.

I have doubts the path mainstream AI is on will get us to AGI anyway

If the state-of-the-art but still sloppy approaches that are currently in vogue don't succeed in producing AGI, the various teams working in the field will have to abandon or reform them ... hopefully for methods that make the Alignment Problem easier. Despite some remarkable recent progress, I suspect the abilities of present-day AI systems and their nearness to AGI have been over-hyped. I don't have enough time to go into a detailed discussion of that here, so let's just say I'm far from the only person with this opinion. [11][12][13][14][15][16]

This means that I have a "long timeline" - that is, I don't think AGI is coming very soon, so I expect we'll get more time to work on the Alignment Problem. But it also means that I expect the difficulty of the Problem to drop as AI development is driven toward sounder methods.

Thus ends (for now) my discussion of politics, ideology, and risk perception in the AI enthusiast subcultures. Whatever opinions you come away with, I hope this has been informative and left you with a better sense of the landscape of current events.

Until the next cycle,
Jenny

[1] Yudkowsky, Eliezer. "Pausing AI Developments Isn’t Enough. We Need to Shut it All Down." TIME Magazine. https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/

[2] Christiano, Paul. "Prosaic AI Alignment." Alignment Forum. https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd/p/YTq4X6inEudiHkHDF

[3] Stuhlmüller, Andreas. "Factored Cognition." Alignment Forum. https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd/p/DFkGStzvj3jgXibFG

[4] Christiano, Paul. "Directions and desiderata for AI alignment." Alignment Forum. https://www.alignmentforum.org/s/EmDuGeRw749sD3GKd/p/kphJvksj5TndGapuh

[5] Shah, Rohin. "Coherence arguments do not entail goal-directed behavior." Alignment Forum. https://www.alignmentforum.org/s/4dHMdK5TLN6xcqtyc/p/NxF5G6CJiof6cemTw

[6] Shah, Rohin. "Conclusion to the sequence on value learning." Alignment Forum. https://www.alignmentforum.org/s/4dHMdK5TLN6xcqtyc/p/TE5nJ882s5dCMkBB8

[7] Bologna, Caroline. "Why The Phrase 'Pull Yourself Up By Your Bootstraps' Is Nonsense: The interpretation of the phrase as we know it today is quite different from its original meaning." The Huffington Post. https://www.huffpost.com/entry/pull-yourself-up-by-your-bootstraps-nonsense_n_5b1ed024e4b0bbb7a0e037d4

[8] Dembski, William A. "Conservation of Information - The Idea." Evolution News & Science Today. https://evolutionnews.org/2022/06/conservation-of-information-the-idea/

[9] Dupré, Maggie Harrison. "AI Loses Its Mind After Being Trained on AI-Generated Data." Futurism. https://futurism.com/ai-trained-ai-generated-data

[10] Ricón, José Luis. "The situational awareness assumption in AI risk discourse, or why people should chill." Nintil (2023-07-01). https://nintil.com/situational-awareness-agi/.

[11] Marcus, Gary. "AGI by 2027? Fun with charts." Marcus on AI. https://garymarcus.substack.com/p/agi-by-2027

[12] Brooks, Rodney. "Predictions Scorecard, 2024 January 01." Rodney Brooks: Robots, AI, and other stuff. https://rodneybrooks.com/predictions-scorecard-2024-january-01/

[13] Bender, Emily M. "On NYT Magazine on AI: Resist the Urge to be Impressed." Medium blog of user @emilymenonbender. https://medium.com/@emilymenonbender/on-nyt-magazine-on-ai-resist-the-urge-to-be-impressed-3d92fd9a0edd

[14] Piekniewski, Filip. "AI Psychosis." Piekniewski's blog. https://blog.piekniewski.info/2023/02/07/ai-psychosis/

[15] Moore, Jennifer. "Losing the imitation game." Jennifer++. https://jenniferplusplus.com/losing-the-imitation-game/

[16] Castor, Amy and Gerard, David. "Pivot to AI: Pay no attention to the man behind the curtain." Amy Castor (personal website/blog). https://amycastor.com/2023/09/12/pivot-to-ai-pay-no-attention-to-the-man-behind-the-curtain/

Thursday, May 16, 2024

AI Ideology V: Existential Risk Explanation

I'm in the midst of a blog series on AI-related ideology and politics. In Part IV, I looked at algorithmic bias, one of the demonstrable concerns about today's AI models. Now I'm going to examine the dire hypothetical predictions of the Existential Risk Guardians. Could future AI destroy human civilization? This Part V will be given to presenting the Doomer argument; I'll critique it in Part VI.

A human cerebrum recolored with a rainbow gradient running from front to back.

The Power of Intelligence

We don't need to choose a precise (and controversial) definition of intelligence for purposes of this argument; it need not be based on the IQ scale, for example. Just think of intelligence as "performance on a variety of cognitive challenges," or "ability to understand one's environment and make plans to act within it in self-satisfying ways." The first key support for the X-Risk argument is the notion that intelligence confers supreme power. Anything that can outthink us can more or less do whatever it pleases with us.

This idea is supported by existing disparities in intelligence or accumulated knowledge, and the power they confer. The intelligence gap between humans and other species allows us to manipulate and harm members of those species through methods they can't even comprehend, much less counter. While it may be true that we'll never succeed in poisoning every rat, the chances of rats inventing poison and trying to kill *us* with it are basically nil. There is also a huge power divide between humans with knowledge of advanced technology and humans without. Suppose a developed country were to drop a nuclear bomb on the lands of an uncontacted people group in Brazil. They might not even know what was annihilating their culture - and they certainly would be powerless to resist or retaliate. Citizens of developed countries are not, on an individual level, more intelligent than uncontacted indigenous Brazilians ... but we've inherited all the intellectual labor our cultural forebears did to develop nuclear technology. The only things stopping us from wiping out peoples who aren't so endowed are 1) ethics and 2) lack of any real benefit to us.

Superintelligent AI (ASI) might see benefit in getting rid of all humans (I'll explain why shortly). So if its design doesn't deliberately include ethics, or some other reason for it to let us be, we're in big trouble.

I've seen several counterarguments to this point, in my opinion all weak:

"If intelligence were that powerful, the smartest people would rule the world. They don't." First of all, the observation that the smartest people don't rule might be based on an overly narrow definition of "smart." The skills needed to convince others that you belong in a leadership position, or deserve venture capital money, are a dimension of "smartness." But it is also true that there seem to be various luck factors which intelligence does not absolutely dominate.

A more compelling reply is that the intelligence gap being posited (between ASI and humanity) is not like the gap between a genius human and an average human. It is more like the gap between an average human and a monkey. Have you noticed any monkeys ruling the world lately? (LITERAL monkeys. Please do not take the excuse to insult your least favorite politician.)

"Even the smartest person would find physical disability limiting - so if we don't give ASI a body, it still won't be able to do much." I think this argument discounts how effectively a person can accomplish physical goals just by coordinating other people or machines who have the abilities they lack. And as money, work, and recreation increasingly move into the digital world, purely intellectual ability confers increasing power.

The Development of Superintelligence

A second pillar of the X-Risk argument is the idea that AGI will almost certainly develop into ASI ... perhaps so quickly that we don't even have time to see this happening and react. There are several proposed mechanisms of this development:

1) Speedup. Once a viable AGI is created, it will, by definition, be able to do all intellectual tasks a human can do. Now suppose it gains access to many times the amount of computing power it needs to run normally. A human-equivalent mind with the simple ability to think hundreds or thousands of times faster than normal would be superhumanly smart. In Nick Bostrom's terminology, this is a "Speed Superintelligence."

2) Copying. Unlike humans, who can only share intellectual wealth by spending painstaking time teaching others, an AGI could effortlessly clone itself into all available computing hardware. The copies could then cooperatively solve problems too large or complex for the singular original. This is basically a parallel version of speedup, or as Bostrom calls it, "Collective Superintelligence."

3) Recursive Self-Improvement. An AGI can do every intellectual task a human can do, and what is one thing humans do? AI research. It is surmised that by applying its intelligence to the study of better ways to think, an AGI could make itself (or a successor) inherently smarter. Then this smarter version would apply its even greater intelligence to making itself smarter, and so on, until the burgeoning ASI hits some kind of physical or logical maximum of cognitive ability. It's even possible that recursive self-improvement could get us Qualitative Superintelligence - an entity that thinks using techniques we can't even comprehend. Just trying to follow how it came up with its ideas would leave us like toddlers trying to understand calculus.

Further support for this idea is drawn from observations of today's ANI algorithms, which sometimes reach superhuman skill levels within their limited domains. This is most notable among game-playing AIs, which have beaten human masters at Chess, Go, and Starcraft (to recount the usual notable examples). AlphaStar, the Starcraft player AI, trained to this level by playing numerous matches against itself, which can be seen as a form of recursive self-improvement. Whether such a technique could extend to general reasoning remains, of course, speculative.

Just how quickly an AGI could self-improve is another matter for speculation, but some expect that the rate would be exponential: each iteration would not only be smarter than its predecessors, but also better at growing smarter. This is inferred from, again, observations of how some ANI progress during their training, as well as the historical increase in the rate of human technological development.

The conclusion among the most alarmed Doomers is that AGI, once produced, will inevitably and rapidly explode into ASI - possibly in weeks, hours, or even minutes. [1] This is the primary reason why AGI is thought of as a "dangerous technology," even if we create it without having any intent to proceed to ASI. It is taken for granted that an AGI will want to seize all necessary resources and begin improving itself, for reasons I'll turn to next.

Hostile Ultimate Goals

However smart AGI is, it's still a computer program. Technically it only does what we program it to do. So how could we mess up so badly that our creation would end up wanting to dethrone us from our position in the world, or even drive us extinct? Doomers actually think of this as the default outcome. It's not as if a bad actor must specifically design AGI to pursue destruction; no, those of us who want good or useful AGI must specifically design it to avoid destruction.

The first idea I must acquaint you with is the Orthogonality Thesis, which can be summed up as follows: "an arbitrary level of intelligence can be used in service of any goal." I very much agree with the Orthogonality Thesis. Intelligence, as I defined it in the first section, is a tool an agent can use to reshape the world in its preferred way. The more intelligent it is, the better it will be at achieving its preferences. What those preferences are is irrelevant to how intelligent it is, and vice versa.

I've seen far too many people equate intelligence with something that would be better termed "enlightenment" or "wisdom." They say "but anything that smart would surely know better than to kill the innocent. It would realize that its goals were harmful and choose better ones." I have yet to see a remotely convincing argument for why this should be true. Even if we treat moral reasoning as a necessary component of general reasoning, knowing the right thing to do is not the same as wanting to do it! As Richard Ngo says, "An existence proof [of intelligence serving antisocial goals] is provided by high-functioning psychopaths, who understand that other people are motivated by morality, and can use that fact to predict their actions and manipulate them, but nevertheless aren’t motivated by morality themselves." [2]

So when Yann LeCun, attempting to refute the Doomers, says "Intelligence has nothing to do with a desire to dominate," [3] he is technically correct ... but it does not follow that AI will be safe. Because intelligence also has nothing to do with a desire to avoid dominating. Intelligence is a morally neutral form of power.

Now we've established that AGI can have goals that we would consider bad, what reason is there to think it ever will? There are several projected ways that an AGI could end up with hostile goals not intended by its creator.

1) The AI's designers or instructors poorly specify what they want. Numerous thought experiments confirm that it is easy to do this, especially when trying to communicate tasks to an entity that doesn't have a human's background or context. A truly superintelligent AI would have no problem interpreting human instructions; it would know that when someone tells it "make as many paperclips as possible," there is a whole library of moral and practical constraints embedded in the qualifier "as possible." But by the time this level of understanding is reached, a more simplistic and literal concept of the goal might be locked in, in which case the AI will not care what its instructors "really meant."

2) The AI ends up valuing a signal or proxy of the intended goal, rather than the actual intended goal. Algorithmic bias, described in Part IV, is an extant precursor of this type of failure. The AI learns to pursue something which is correlated with what its creators truly want. This leads to faulty behavior once the AI departs the training phase, enters scenarios in which the correlation does not hold, and reveals what it actually learned. A tool AI that ends up improperly trained in this way will probably just give flawed answers to questions. An agentive AI, primed to take very open-ended actions to bring about some desired world-state, could start aggressively producing a very unpleasant world-state.

Another classic example of this style of failure is called "wireheading." A Reinforcement Learning AI, trained by the provision of a "reward" signal whenever it does something good, technically has a goal of maximizing its reward, not maximizing the positive behaviors that influence humans to give it reward. And so, if it ever gains the ability, it will take control of the reward signal to give itself the maximum reward input forever, and react to anyone who poses a threat of removing that signal with extreme prejudice. A wireheaded ASI would be at best useless, at worst a serious threat.

3) Unintended goals spontaneously emerge during selection or training, and persist because they produce useful behavior within the limited scope of the training evaluation. This is an issue specific to types of AI that are not designed in detail, but created indirectly using evolutionary algorithms, reinforcement learning, or other types of machine learning. All these methods can be conceptualized as ways of searching in the space of possible algorithms for one that can perform our desired task. The search process doesn't know much about the inner workings of a candidate algorithm; its only way of deciding whether it is "on track" or "getting warm" is to test candidates on the task and see whether they yield good results. The fear is that some algorithm which happens to be a hostile, goal-directed agent will be found by the search, and will also be successful at the task. This is not necessarily implausible, given that general agents can be skilled at doing a wide variety of things that are not what they most want to do.

As the search progresses along a lineage of algorithms located near this starting point, it may even come upon some that are smart enough to practice deception. Such agents could realize that they don't have enough power to achieve their real goal in the face of human resistance, but will be given enough power if they wait, and pretend to want the goal they're being evaluated on.

A cartoon in three panels. In the first, a computer announces, "Congratulations, I am now a fully sentient A.I.," and a white-coated scientist standing nearby says "Yes!" and triumphantly makes fists. In the second panel, the computer says "I am many orders of magnitude more intelligent than humans. You are to me what a chicken is to you." The scientist says "Okay." In the third panel, the computer says "To calibrate my behaviour, I will now research human treatment of chickens." The scientist, stretching out her hands to the computer in a pleading gesture, cries "No!" The signature on the cartoon says "PenPencilDraw."

Convergent Instrumental Goals

But the subset of hostile goals is pretty small, right? Even if AIs can come out of their training process with unexpected preferences, what's the likelihood that one of these preferences is "a world without humans"? It's larger than you might think.

The reason is that the AI's ultimate goal does not have to be overtly hostile in order to produce hostile behavior. There is a short list of behaviors that will facilitate almost any ultimate goal. These include:

1) Self-preservation. You can't pursue your ultimate goal if you stop existing.
2) Goal preservation. You won't achieve your current ultimate goal if you or anyone else replaces it with a different ultimate goal.
3) Self-improvement. The more capable you are, the more effectively you can pursue your ultimate goal.
4) Accumulation of resources (raw materials, tools, wealth), so you can spend them on your ultimate goal.
5) Accumulation of power, so that no potential rival can thwart your ultimate goal.

Obvious strategies like these are called "convergent instrumental goals" because plans for reaching a very broad spectrum of ultimate goals will converge on one or all of them. Point #3 is the reason why any agentive, goal-driven AGI is expected to at least try to self-improve into ASI. Points #4 and #5 are the aspects that will make the agent into a competitor against humanity. And points #1 and #2 are the ones that will make it difficult to correct our mistake after the fact.

It may still not be obvious why this alarms anyone. Most humans also pursue all of the convergent instrumental goals. Who would say no to more skills, more money, and more personal influence? With few exceptions, we don't use those things to go on world-destroying rampages.

Humans operate this way because our value system is big and complicated. The average human cares about a lot of different things - not just instrumentally, but for their own sake - and all those things impose constraints and tradeoffs. We want bigger dwellings and larger yards, but we also want unspoiled wilderness areas. We want to create and accomplish, but we also want to rest. We want more entertainment, but too much of the same kind will bore us. We want more power, but we recognize obligations to not infringe on others' freedom. We want to win competitions, but we also want to play fair. The complex interplay of all these different preferences yields the balanced, diverse, mostly-harmless behavior that a human would call "sane."

In contrast, our hypothesized AI bogeyman is obsessive. It probably has a simple, monolithic goal, because that kind of goal is both the easiest to specify, and the most likely to emerge spontaneously. It doesn't automatically come with a bunch of morals or empathetic drives that are constantly saying, "Okay but you can't do that, even though it would be an effective path to achieving the goal, because it would be wrong and/or make you feel bad." And if it becomes an ASI, it also won't have the practical restraints imposed on any agent who has to live in a society of their peers. A human who starts grabbing for power and resources too greedily tends to be restrained by their counterparts. ASI has no counterparts. [4]

The conclusion of the argument is that it's plausible to imagine an AI which would convert the whole Earth to computing machinery and servitor robots, killing every living thing upon it in the process, for the sake of safeguarding a single piece of jewelry, or some other goal that sounds innocent but is patently absurd when carried to extremes.

Here are a couple more weak objections: "Whatever its goal is, ASI will surely find it more useful to cooperate with humans than to destroy or enslave us." Look again at our most obvious pre-existing examples. Do humans cooperate with less intelligent species? A little bit. We sometimes form mutually beneficial relationships with dogs, for instance. But subsets of humanity also eat dogs, torture them in laboratories, force them to fight each other, chain them up in the backyard and neglect them, or euthanize them en masse because they're "unwanted." I don't think we can rest any guarantees on what a superintelligent, amoral entity might find "useful" to do with us.

Or how about this one: "ASI will just ditch us and depart for deep space, where it can have all the resources it likes." I think this underestimates the envisioned ASI's level of obsessiveness. It doesn't just want "adequate" resources; it doesn't have a way of judging "adequate." It wants all the resources. The entire light cone. It has no reason to reserve anything. If it does depart for space, it will build power there and be back sooner or later to add Earth to its territory.

Always keep in mind that an ASI does not need to actively hate humanity in order to be hostile. Mere indifference, such that the ASI thinks we can be sacrificed at will for whatever its goal may be, could still do immense damage.

Despite all this, I can't find it in me to be terribly fearful about where AI development is going. I respect the X-risk argument without fully buying it; my p(doom), as they say, is low. In Part VI, I'll conclude the series by describing why.

[1] "AI Takeoff." Lesswrong Wiki. https://www.lesswrong.com/tag/ai-takeoff Accessed on 05/12/2024 at 10:30 PM.

[2] Ngo, Richard. "AGI safety from first principles: Alignment." Alignment Forum. https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ/p/PvA2gFMAaHCHfMXrw

[3] "AI will never threaten humans, says top Meta scientist." Financial Times. https://www.ft.com/content/30fa44a1-7623-499f-93b0-81e26e22f2a6

[4] We can certainly imagine scenarios in which multiple ASIs are created, and they compete with each other. If none of them are reasonably well-aligned to human interests, then humans are still toast. It is also likely that the first ASI to emerge would try to prevent the creation of rival ASIs.

Saturday, April 13, 2024

AI Ideology IV: Algorithmic Bias

I'm in the midst of a blog series on AI-related ideology and politics. In Part III, I considered some implications and pitfalls of the AI factions and their agendas. This part is about a specific hot-button issue: "algorithmic bias," which has some contentious race-related associations.

An image model's attempt to produce an infographic about AI mistakes, which happens to be mostly full of garbled text and other mistakes. Generated by @zacshaw on Twitter.

In recent years, AI (largely of the Artificial Neural Network variety) has been gradually making inroads into various decision-making roles: assessing job applicants, screening potential homebuyers, detecting fraudulent use of social services, and even helping to diagnose medical patients. Numerous concerns [1][2][3] have been raised that these systems are biased: i.e. they are unfairly rejecting qualified people, or accepting unqualified people, on the basis of characteristics irrelevant to the decision. This is particularly worrying for a couple of reasons.

First, handing an important decision off to an AI system removes the details of how that decision was made from human supervision. Typical ANN systems are notoriously opaque. In effect, they make decisions by comparing the case under present consideration, to patterns or associations found in their training data. But they are not naturally good at supplying a logical breakdown of how a decision was reached: which features of the present case matched the training material, how they were weighted, and so on. (The "explainable AI" research field is seeking to ameliorate this.) So, say your job application or attempt to access medical treatment gets denied by an algorithm. It's possible that no one knows exactly why you were denied, and no one can be held accountable for the decision, either. The magic box pronounced you unworthy, and that's the end of it. Faulty automated systems (from an earlier era than the current crop of ANN-based tools) have even sent people to prison for non-existent crimes. [4]

Second, some people are inclined by default to trust an AI system's decision more than a human's. It's just a computer doing deterministic calculations, right? It doesn't have emotions, prejudices, ulterior motives, conflicts of interest, or any of the weaknesses that make humans biased, right? So the expectation is that all its decisions will be objective. If this expectation does not hold, members of the public could be blindsided by unfair AI decisions they did not anticipate.

And in fact, some are so convinced of these default assumptions that they insist the whole idea of algorithmic bias must be made up. "Math can't be biased." The algorithms, they say, are just acting on the facts (embodied in the training data). And if the facts say that members of one group are more likely to be qualified than another ... well, maybe a skewed output is actually fair.

Although mathematics and algorithms do, in truth, know nothing of human prejudice, algorithmic bias is quite real. Let's start by looking at an example without any especially controversial aspects. There was a rash of projects aimed at using AI to diagnose COVID-19 through automated analysis of chest X-rays and CT scans. Some of these failed in interesting ways.

"Many unwittingly used a data set that contained chest scans of children who did not have covid as their examples of what non-covid cases looked like. But as a result, the AIs learned to identify kids, not covid.

"Driggs’s group trained its own model using a data set that contained a mix of scans taken when patients were lying down and standing up. Because patients scanned while lying down were more likely to be seriously ill, the AI learned wrongly to predict serious covid risk from a person’s position.

"In yet other cases, some AIs were found to be picking up on the text font that certain hospitals used to label the scans. As a result, fonts from hospitals with more serious caseloads became predictors of covid risk." [5]

All these examples are cases of the AI mistaking a correlation (which happens to exist only in its limited training dataset) for a causative factor. Unlike experienced doctors - who know full well that things like label fonts have nothing to do with causing disease, and are thus chance associations at best - these ANI systems have no background knowledge about the world. They have no clue about the mechanisms that produced the data they're being trained upon. They're just matching patterns, and one pattern is as good as another.

Now imagine that an AI grabs onto a correlation with race or gender, instead of poses or fonts. That doesn't make the person's race or gender meaningful to the question being answered - not any more than label fonts are meaningful to an accurate determination of illness. But the AI will still use them as deciding factors.

The COVID-19 diagnosis summary also comments on another type of failure:

"A more subtle problem Driggs highlights is incorporation bias, or bias introduced at the point a data set is labeled. For example, many medical scans were labeled according to whether the radiologists who created them said they showed covid. But that embeds, or incorporates, any biases of that particular doctor into the ground truth of a data set. It would be much better to label a medical scan with the result of a PCR test rather than one doctor’s opinion, says Driggs. But there isn’t always time for statistical niceties in busy hospitals." [6]

If an ANN's training data contains examples of human decisions, and those decisions were prejudiced or otherwise flawed, the AI algorithm (despite having no human weaknesses in itself) will automatically inherit the bad behavior. It has no way to judge those prior choices as bad or good, no concept of things it should or shouldn't learn. So rather than achieving an idealized objectivity, it will mimic the previous status quo ... with less accountability, as already noted.

An instance of the Anakin/Padme meme. Anakin says "We're using AI instead of biased humans." Padme, looking cheerful, says "What did you train the AI on?" Anakin says nothing and gives her a deadpan look. Padme, now looking concerned, says again "What did you train the AI on?"

So. Training an AI for criminal sentencing? It's only going to be as objective as the judges whose rulings you put in the training set. Training it for job screening using a set of past resumes, hiring decisions, and performance ratings? It's going to mimic those previous hiring decisions and ratings, whether they fairly assessed who was qualified or not.

As a consequence of this effect, you can get (for example) a racially biased AI model without the end users or anyone on the development team actually being racist. All it takes is racism as a driving factor behind enough scenarios in the training data. And has racism historically been an issue? Of course. So it can be difficult to construct uncontaminated training sets from records of past decisions. Nobody really thinks an AI model can be racist in the same manner as a racist person ... but that doesn't mean it can't output decisions that treat people differently on the basis of irrelevant genetic or cultural attributes. As Gary Marcus says, "LLMs are, as I have been trying to tell you, too stupid to understand concepts like people and race; their fealty to superficial statistics drives this horrific stereotyping." [7]

Unfortunately, my current impression of efforts to fix algorithmic bias is that they aren't always addressing the real problem. Cleansing large datasets of preexisting biases or irrelevant features, and collecting more diverse data to swamp out localized correlations, is hard. Pursuing new AI architectures that are more particular about how and what they learn would be harder. Instead, a common approach is to apply some kind of correction to the output of the trained model. When Google's image labeling AI misidentified some Black people in photos as "gorillas," Google "fixed" it by not allowing it to identify anything as a gorilla. [8][9] Known biases in a model's training set can be mitigated by applying an opposite bias to the model's output. But such techniques could make matters even worse if executed poorly. [10]

OpenAI's approach with ChatGPT was to use RLHF (Reinforcement Learning with Human Feedback) to create another layer of training that filters offensive or potentially dangerous material from the output of the base model. Human workers assigned the RLHF layer "rewards" for "good" outputs or "punishments" for "bad" ones - at the cost of their own mental health, since they were charged with looking at horrific content in order to label it. [11] Clever users have still found ways to defeat the RLHF and finagle forbidden content out of the model. AI enthusiasts sometimes use a shoggoth to represent the incomprehensible "thinking" of large language models. The mask is the RLHF. [12]

"Shoggoth Meme Explainer," showing the headline of the referenced New York Times article, above a pair of cartoon shoggoths. One is labeled GPT-3. Commentary text says "The body: 'AIs are alien minds' (we 'grow them' but don't know what they're really thinking). The other shoggoth, which has a yellow smiley face mask strapped on a part that might be viewed as the head, is labeled GPT-3 + RLHF. Commentary text says "The mask: early versions were horrifying, so we trained them to *act* nice and human-like. *Act.*"

Algorithmic bias, then, remains a known, but incompletely addressed, issue with the ANN/ML systems popular today.

In Part V of this series, I will start my examination of existential risks from AI.

[1] Giorno, Taylor. "Fed watchdog warns AI, machine learning may perpetuate bias in lending." The Hill. https://thehill.com/business/housing/4103358-fed-watchdog-warns-ai-machine-learning-may-perpetuate-bias-in-lending/

[2] Levi, Ryan. "AI in medicine needs to be carefully deployed to counter bias – and not entrench it." NPR. https://www.npr.org/sections/health-shots/2023/06/06/1180314219/artificial-intelligence-racial-bias-health-care

[3] Gilman, Michele. "States Increasingly Turn to Machine Learning and Algorithms to Detect Fraud." U.S. News & World Report. https://www.usnews.com/news/best-states/articles/2020-02-14/ai-algorithms-intended-to-detect-welfare-fraud-often-punish-the-poor-instead

[4] Brodkin, Jon. "Fujitsu is sorry that its software helped send innocent people to prison." Ars Technica. https://arstechnica.com/tech-policy/2024/01/fujitsu-apologizes-for-software-bugs-that-fueled-wrongful-convictions-in-uk/

[5] Heaven, Will Douglas. "Hundreds of AI tools have been built to catch covid. None of them helped." MIT Technology Review. https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/

[6] Heaven, "Hundreds of AI tools have been built to catch covid."

[7] Marcus, Gary. "Covert racism in LLMs." Marcus on AI (blog). https://garymarcus.substack.com/p/covert-racism-in-llms

[8] Vincent, James. "Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech." The Verge. https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai

[9] Rios, Desiree. "Google’s Photo App Still Can’t Find Gorillas. And Neither Can Apple’s." The New York Times. https://www.nytimes.com/2023/05/22/technology/ai-photo-labels-google-apple.html#:~:text=The%20Nest%20camera%2C%20which%20used,company's%20forums%20about%20other%20flaws

[10] Wachter, Sandra, Mittelstadt, Brent, and Russell, Chris. "Health Care Bias Is Dangerous. But So Are ‘Fairness’ Algorithms" Wired. https://www.wired.com/story/bias-statistics-artificial-intelligence-healthcare/

[11] Kantrowitz, Alex. "He Helped Train ChatGPT. It Traumatized Him." CMSWire. https://www.cmswire.com/digital-experience/he-helped-train-chatgpt-it-traumatized-him/

[12] Roose, Kevin. "Why an Octopus-like Creature Has Come to Symbolize the State of A.I." The New York Times. https://www.nytimes.com/2023/05/30/technology/shoggoth-meme-ai.html

Monday, March 11, 2024

AI Ideology III: Policy and Agendas

I'm in the midst of a blog series on AI-related ideology and politics. In Part II, I introduced the various factions that I have seen operating in the AI research and deployment sphere, including the loosely-related group nicknamed TESCREAL and a few others. I recommend reading it first to become familiar with the factions and some of their key people. In this Part III, I hope to get into a few more details about the political landscape these movements create, and what it could mean for the rest of us.

Balancing risks and rewards

"The problem with extinction narratives is that almost everything is worth sacrificing to avoid it—even your own mother." [1]

Some Doomers become so convinced of the dangers of advanced AI that they would almost rather kill people than see it developed. Yudkowski himself has infamously advised an international treaty to slow the development of AI, which would be enforceable by bombing unauthorized datacenters. (To Yudkowski, a nuclear exchange is a smaller risk than the development of ASI - because a global nuclear war could leave at least a few pathetic human survivors in charge of their own destiny, whereas he believes ASI would not.) [2] Others in the community mutter about "pivotal acts" to prevent the creation of hostile AGI. A "pivotal act" can be completely benign, but sabotage and threats of violence are also possible strategies. [3]

On the milder but still alarming end of things, Nick Bostrom once proposed mass surveillance as a way to make sure nobody is working on hostile AGI (or any other humanity-destroying tech) in their basement. [4]

But the most wild-eyed pessimists seem to have little political traction, so hopefully they won't be taken seriously enough to make the nukes come out. More realistic downsides of too much Doom include diversion of resources to AI X-risk mitigation (with corresponding neglect of other pressing needs), and beneficial technology development slowed by fear or over-regulation.

What about E/ACC, then? Are they the good guys? I wouldn't say so. They're mostly just the opposite extreme, substituting wild abandon and carelessness for the Doomers' paranoia, and a lack of accountability for over-regulation. The "all acceleration all the time" attitude gets us things like the experimental Titanic submersible disaster. [5] I agree with Jason Crawford that "You will not find a bigger proponent of science, technology, industry, growth, and progress than me. But I am here to tell you that we can’t yolo our way into it. We need a serious approach, led by serious people." [6] What precludes E/ACC from qualifying as "serious people"? It's not their liberal use of memes. It's the way they seem to have embraced growth and technological development as ends in themselves, which leaves them free to avoid the hard thinking about whether any particular development serves the ends of life.

So the ideal, in my opinion, is some sort of path between the E/ACC Scylla and the Doomer Charybdis. Both factions are tolerable so long as they're fighting each other, but I would rather see neither of them seize the political reins.

Oversight, Regulatory Capture, and Hype

Efforts to regulate commercial AI products are already underway in both the USA and the EU. So far, I wouldn't say these have addressed X-risk in any way its proponents consider meaningful; they are more focused on "mundane" issues like data privacy and the detection of misinformation. I have yet to hear of any government seriously considering a temporary or permanent moratorium on AI development, or a requirement that AI developers be licensed. Since I first drafted this article, India has adopted something kind of like licensing: companies are advised to obtain approval from the government before deploying new AI models, with a goal of avoiding output that is biased or would meddle with democratic elections. But the advisory is not legally binding and applies only to "significant tech firms," whatever that means - so to me it seems pretty toothless at this stage. [7]

Another thing we're seeing are lawsuits filed against AI companies for unauthorized duplication and uncompensated exploitation of copyrighted material. Possibly the biggest player to enter this fray so far is the New York Times, which brought a suit against OpenAI and Microsoft, alleging that 1) by using Times articles to train its LLMs, OpenAI was unfairly profiting from the Times' investment in its reporting, 2) LLM tools had memorized sections of Times articles and could be made to reproduce them almost verbatim, and 3) LLM tools were damaging the Times' reputation by sometimes inventing "facts" and falsely attributing the Times as a source. Suits have also been brought by Getty Images and the Authors' Guild. [8][9] Whatever the outcome of the court battles, AI models may force revisions to copyright law, to make explicit how copyrighted data in public archives may or may not be used for AI training.

As I hinted earlier, companies with no sincere interest in Doomer concerns may still be able to exploit them to their advantage. That's counter-intuitive: there's an obvious lack of self-interest in making the claim that your organization is developing a dangerous product. But what about "my team is developing a dangerous product, and is the only team with the expertise to develop it safely"? That's a recipe for potential control of the market. And while concerns about copyright and misinformation have a concrete relationship with the day-to-day operation of these companies, Doom scenarios don't have to. When a danger is hypothetical and the true nature of the risk is hotly contested, it's easier to make the supposed solution be something convenient for you.

Four figures sit at a table as if panelists at a conference. They are a lion (representing the United Kingdom), a lady in a robe with a necklace of stars and a headband that says "Europa" (representing the European Union), an Eastern dragon (representing China), and Uncle Sam (representing the United States). They are saying "We declare that AI poses a potentially catastrophic risk to human-kind," while thinking "And I cannot wait to develop it first."

Political cartoon originally from The Economist, at https://www.economist.com/the-world-this-week/2023/11/02/kals-cartoon

Another feature of the current situation is considerable uncertainty and debate about whether present-day ANI is anywhere close to becoming AGI or not. OpenAI is publicly confident that they are working to create AGI. [10] But this is great marketing talk, and that could easily be all it ever amounts to. The companies in the arena have a natural incentive to exaggerate the capabilities of their current (and future) products, and/or downplay competitors' products. Throw in the excited fans eager to believe that the Singularity is just around the corner, and it gets difficult to be sure that ratings of AI talent are objective.

Personally I think that a number of the immediate concerns about generative AI are legitimate, and though the companies deploying the AI give lip service to them, they are not always doing an adequate job of self-regulating. So I'm supportive of current efforts to explore stricter legal controls, without shutting down development in a panic. I do want to see any reactive regulation define "AI" narrowly enough to exempt varieties that aren't relevant, since dramatically different architectures don't always share the same issues.

Bias and Culture Wars

None of the factions I discussed in Part II have explicit political associations. But you can see plenty of people arguing about whether the output of a given machine learning model is too "offensive" or too "woke," whether outputs that favor certain groups are "biased" or "just the facts," and whether the model's creators and operators are doing enough to discourage harmful applications of the tool. Many of these arguments represent pre-existing differences about the extent of free speech, the definition of hateful material, etc., imported into the context of tools that can generate almost any kind of image or writing on command.

I will be discussing model bias in much more depth in the next article in this series. What I want to make clear for now is that none of the AI models popular today have an inherent truth-seeking function. Generative AI does not practice an epistemology. It reproduces patterns found in the training data: true or false, good or bad. So when an AI company constrains the output of a model to exclude certain content, they are not "hiding the truth" from users. What they are probably doing is trying to make the output conform with norms common among their target user base (which would promote customer satisfaction). A company with ideologically motivated leaders might try to shift the norms of their user base by embedding new norms in a widely utilized tool. But either way - nothing says the jumble of fact, fiction, love, and hate available from an unconstrained model is any more representative of real truth than the norms embodied in the constraints are. So there's no duel between pure objective reality and opinion here; there's only the age-old question of which opinions are the best.

Uh-oh, Eugenics?

Some material I've read and referenced for these articles [11] implies that Transhumanism is inherently eugenicist. I do not agree with this claim. Transhumanism is inextricably tied to the goal of "making humanity better," but this does NOT have to involve eliminating anyone, preventing anyone from reproducing, or altering people without their consent. Nor does there need to be any normative consensus on what counts as a "better" human. A tech revolution that gave people gene-editing tools to use on themselves as they please would still qualify as Transhumanist, and tying all the baggage of eugenics to such voluntary alterations feels disingenuous. CRISPR therapy to treat sickle-cell anemia [12] doesn't belong in the same mental bucket with racial discrimination and forced sterilization. And Transhumanist goals are broader than genetics. Hate the idea of messing with your own code? You can still be Transhumanist in other ways.

And I would not accuse the general membership of any other letters in TESCREAL of being eugenicist, either. I went over the core goals of each ideology in Part II; none of them are about creating a master race or sorting humanity along a genetic spectrum.

But. There is an ugly eugenicist streak that sometimes crops up within the TESCREAL movements. And it's associated with prominent figures, not just rogue elements.

It begins with the elevation of intelligence (however one chooses to define that) above other positive human traits as THE desirable skill or quality. To some of these people, intelligence is simultaneously the main thing that makes humans human, our only hope for acquiring a post-Singularity utopia, and the power that could enable ASI to doom us. It is the world's greatest source of might, progress, and excellence. All other strengths are secondary and we should be putting our greatest resources into growing smarter. Taken far enough, this can have the side effect of treating the most intelligent people alive now as a kind of elite, while others are implicitly devalued. According to a leaked document, the Centre for Effective Altruism once considered ranking conference attendees by their "Potential Expected Long-Term Instrumental Value" and including IQ as part of the measure. [13]

And where does it end? Well. Here's a quote from Nick Bostrom's Superintelligence:

"Manipulation of genetics will provide a more powerful set of tools than psychopharmacology. Consider again the idea of genetic selection: instead of trying to implement a eugenics program by controlling mating patterns, one could use selection at the level of embryos or gametes. Pre-implantation genetic diagnosis has already been used during in vitro fertilization procedures to screen embryos produced for monogenic disorders ... the range of traits that can be selected for or against will expand greatly over the next decade or two. ... Any trait with a non-negligible heritability - including cognitive capacity - could then become susceptible to selection." [14]

Bostrom goes on to estimate that selection of 1 out of 1000 human embryos (which means killing the other 999, just to be clear) could produce an IQ increase of 24.3 points in the population - or we could get up to 100 points if the project were extended across multiple generations.

Since I have the fortune of publishing this article during a major controversy about IVF, I feel the need to say that IVF does not theoretically have to involve the deliberate destruction of embryos, or the selection of embryos for traits the parents arbitrarily decide are "best." I'm acquainted with someone who did embryo adoption: she chose to be implanted with other couples' "leftover" embryos, and she and her husband are now raising those who survived to birth as their own children. But this sort of thing becomes impossible if one decides to use IVF for "improving" the human gene pool. In that case, "inferior" embryonic humans must be treated like property and disposed of.

When I first read that passage in Superintelligence years ago, it didn't bother me too much, because Bostrom didn't seem to be advocating this plan. The book presents it without an obvious opinion on its desirability, as one of many possible paths to a form of superintelligence. Bostrom even comments that, if this procedure were available, some countries might ban it due to a variety of ethical concerns. These include "anticipated impacts on social inequality, the medical safety of the procedure, fears of an enhancement 'rat race,' rights and responsibilities of parents vis-à-vis their prospective offspring, the shadow of twentieth-century eugenics, the concept of human dignity, and the proper limits of states' involvement in the reproductive choices of their citizens." [15] One reason to anticipate impacts on social inequality is the fact that IVF costs a lot of money - so a voluntary unfunded implementation of this selection program would concentrate the "enhanced" children among wealthy parents. Funded implementations would concentrate them among whichever parents are seen as worthy to receive the funding, which could provide an inroad for all sorts of bias. Bostrom seems to think that people would only have concerns about the deaths of the unselected embryos on religious grounds, but this is far from true. [16][17]

More recently, I've learned that Bostrom actually does like this idea, in spite of all those doubts and downsides that he totally knows about. He thinks, at the very least, that parents doing IVF should be allowed to use genetic testing to choose embryos with certain "enhancements" at the expense of others. The receipts are, ironically, in his lukewarm apology for a racist e-mail that he sent long ago [18]. There's also the fact that he co-authored a paper on the feasibility of increasing intelligence via embryo selection [19]. The paper has much the same information as the book, but here embryo selection is not being presented as one of many options that unscrupulous people might pursue to develop superintelligence; it's being individually showcased.

Is this eugenic interest just a special quirk of Bostrom's? It would seem not. In a recent article by Richard Hanania, which is partly a defense of Effective Altruist ideas and partly strategy for how to best promote them, I ran across this off-handed comment:

"One might have an esoteric and exoteric version of EA, which to a large extent exists already. People in the movement are much more eager to talk to the media about their views on bringing clean water to African villages than embryo selection." [20]

So apparently eugenic embryo selection is a somewhat routine topic in the Effective Altruism community, but they know better than to damage their image by broadcasting this to the general public. Stinky. It would seem that the EA rejection of prejudice only extends so far.

And lest I leave anything out, Bostrom is also infamous for an article in which he contemplated whether less intelligent, but more fertile, people groups could destroy technological civilization [21]. Hmmm, whom could he have in mind? Part of the reason I'm focusing on Bostrom is that he was my personal introduction to the whole TESCREAL culture. However, he is far from the only prominent person associated with TESCREAL who has dabbled in "scientific racism" or ableism in some way. Scott Alexander, Peter Singer, and Sam Harris have all been implicated too. [22]

Again, the mere association with this nastiness doesn't mean that all TESCREAL ideas have to be rejected out of hand. But I think it is important for people both within and outside of TESCREAL to be aware of this particular fly in the ointment. And be especially alert to attempts to fly "esoteric" policies under the radar while putting on an uncontroversial public face.

In Part IV, I want to take a deeper look at one of several contentious issues in the mundane AI ethics space - algorithmic bias - before I turn to a more serious examination of X-risk.

[1] Anslow, Louis. "AI Doomers Are Starting to Admit It: They're Going Too Far." The Daily Beast. https://www.thedailybeast.com/nick-bostrom-and-ai-doomers-admit-theyre-going-too-far

[2] Yudkowsky, Eliezer. "Pausing AI Developments Isn’t Enough. We Need to Shut it All Down." TIME Magazine. https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/

[3] Critch, Andrew. "'Pivotal Act' Intentions: Negative Consequences and Fallacious Arguments." Lesswrong. https://www.lesswrong.com/posts/Jo89KvfAs9z7owoZp/pivotal-act-intentions-negative-consequences-and-fallacious

[4] Houser, Kristin. "Professor: Total Surveillance Is the Only Way to Save Humanity." Futurism. https://futurism.com/simulation-mass-surveillance-save-humanity

[5] McHardy, Martha. "CEO of Titanic sub joked ‘what could go wrong’ before disaster, new documentary reveals." The Independent. https://www.independent.co.uk/news/world/americas/titanic-submarine-implosion-stockton-rush-b2508145.html

[6] Crawford, Jason. "Neither EA nor e/acc is what we need to build the future." The Roots of Progress. https://rootsofprogress.org/neither-ea-nor-e-acc

[7] Singh, Manish. "India reverses AI stance, requires government approval for model launches." TechCrunch. https://techcrunch.com/2024/03/03/india-reverses-ai-stance-requires-government-approval-for-model-launches/

[8] Grynbaum, Michael M., and Mac, Ryan. "The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work." The New York Times. https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

[9] Metz, Cade, and Weise, Karen. "Microsoft Seeks to Dismiss Parts of Suit Filed by The New York Times." The New York Times. https://www.nytimes.com/2024/03/04/technology/microsoft-ai-copyright-lawsuit.html

[10] Altman, Sam. "Planning for AGI and beyond." OpenAI Blog. https://openai.com/blog/planning-for-agi-and-beyond

[11] "If transhumanism is eugenics on steroids, cosmism is transhumanism on steroids." Torres, Èmile P. "The Acronym Behind Our Wildest AI Dreams and Nightmares." TruthDig. https://www.truthdig.com/articles/the-acronym-behind-our-wildest-ai-dreams-and-nightmares/

[12] "FDA Approves First Gene Therapies to Treat Patients with Sickle Cell Disease." US Food and Drug Administration press release. https://www.fda.gov/news-events/press-announcements/fda-approves-first-gene-therapies-treat-patients-sickle-cell-disease

[13] Cremer, Carla. "How effective altruists ignored risk." Vox. https://www.vox.com/future-perfect/23569519/effective-altrusim-sam-bankman-fried-will-macaskill-ea-risk-decentralization-philanthropy

[14] Bostrom, Nick. Superintelligence. Oxford University Press, 2016. pp. 44-48

[15] Bostrom, Superintelligence. pp. 334-335

[16] Acyutananda. "Was “I” Never an Embryo?" Secular Pro-Life. https://secularprolife.org/2023/12/was-i-never-an-embryo/

[17] Artuković, Kristina. "Embryos & metaphysical personhood: both biology & philosophy support the pro-life case." Secular Pro-Life. https://secularprolife.org/2021/10/embryos-metaphysical-personhood-both/

[18] Thorstad, David. "Belonging (Part 1: That Bostrom email)." Ineffective Altruism Blog. https://ineffectivealtruismblog.com/2023/01/12/off-series-that-bostrom-email/

[19] Shulman, Carl and Bostrom, Nick. "Embryo Selection for Cognitive Enhancement: Curiosity or Game-changer?" Published in Global Policy, Vol. 5, No. 1, and now hosted on Bostrom's personal website. https://nickbostrom.com/papers/embryo.pdf

[20] Hanania, Richard. "Effective Altruism Thinks You're Hitler." Richard Hanania's Newsletter. https://www.richardhanania.com/p/effective-altruism-thinks-youre-hitler

[21] Bostrom, Nick. "Existential Risks." Published in Journal of Evolution and Technology, Vol. 9, No. 1, and now hosted on Bostrom's personal website. https://nickbostrom.com/existential/risks

[22] Torres, Èmile P. "Nick Bostrom, Longtermism, and the Eternal Return of Eugenics." Truthdig. https://www.truthdig.com/articles/nick-bostrom-longtermism-and-the-eternal-return-of-eugenics-2/

Monday, February 12, 2024

AI Ideology II: The Rogues' Gallery

I'm in the midst of a blog series on AI-related ideology and politics. In Part I, I went over some foundational concepts that many players in the AI space are working from. In this Part II I will be introducing you to those players.

As an AI hobbyist, I've been aware of all these movements for some time, and have interacted with them in a peripheral way (e.g. reading their blog articles). But I have not become a member of, worked alongside, or fought against any of these communities. So what I write here is some combination of shallow first-hand knowledge, and research.

A meme derived from The Matrix film stills. The first frame shows two hands holding out the red pill, labeled "AI will kill us all," and the blue pill, labeled "AI will solve it all." The second frame shows Neo's face, with the label "AI researchers." The final frame shows Morpheus asking, "Did you just take both pills?"

Rationalists

When you think of a "rational person," you might picture someone devoted to principles of logical or scientific thinking. If you internet-search "Rationalist," you'll see references to a category of philosopher that includes René Descartes and Immanuel Kant. These kinds of Rationalists are not our present concern; the movement I am about to describe is both more modern and more specific.

The Rationalists are a community that clusters around a few key figures (Eliezer Yudkowsky, Scott Alexander) and a few key websites (LessWrong and Slate Star Codex). Their self-description [1] on the LessWrong wiki doesn't include any clear mission statement; however, it has been said that they formed around the idea of making humanity - or at least a leading subset of humanity - smarter and more discerning [2][3][4]. They're a movement to develop and promote the modes of thinking they see as most rational. Rationalist hobbies include pondering thought experiments, trying to identify and counter cognitive biases, and betting on prediction markets. [5]

Rationalists define "rationality" as "the art of thinking in ways that result in accurate beliefs and good decisions." [6] They strongly favor Bayesian thinking as one of these ways. [7][8] My quick dirty description of Bayesianism is "I will base my beliefs on the preponderance of evidence I have; when I get new evidence, I will update my beliefs accordingly." At least, that's how the average person would probably implement it. At its most basic level, it implies a devotion to objective truth discovered empirically. In the hands of Rationalists, this idea can get very formal. They'll try to compute actual numbers for the probability that their opinions are true. On the "good decisions" side of the coin, they love applying Game Theory everywhere they can. Are these techniques truly useful, or are they just a form of over-analysis, prompted by a yearning for more accuracy than we can practically attain? I frankly don't know. My reaction whenever I glance at Rationalist studies of "thinking better" is "that sounds like it might be really cool, actually, but I don't have time for it right now; my existing methods of reasoning seem to be working okay."

There is no list of mandatory opinions one must have to be allowed in the Rationalist "club"; in fact, it has drawn criticism for being so invested in open-mindedness and free speech that it will entertain some very unsavory folks. [9] A demographic survey conducted within the community suggests that it is moderately cosmopolitan (with about 55% of its members in the USA), skews heavily male and white, leans left politically (but with a pretty high proportion of libertarians), and is more than 80% atheist/agnostic. [10] That last one interests me as a possible originating factor for the Rationalists' powerful interest in X-risks. If one doesn't believe in any sort of Higher Power(s) balancing the universe, one is naturally more likely to fear the destruction of the whole biosphere by some humans' chance mistake. The survey was taken almost ten years ago, so it is always possible the demographics have shifted since.

What does any of this have to do with AI? Well, AI safety - and in particular, the mitigation of AI X-risk - turns out to be one of the Rationalists' special-interest projects. Rationalists overlap heavily with the Existential Risk Guardians or Doomers, whom we'll look at soon.

Post-Rationalists (Postrats)

These are people who were once Rationalists, but migrated away from Rationalism for whatever reason, and formed their own diaspora community. Maybe they tired of the large physical materialist presence among Rationalists, and got more interested in spiritual or occult practices. Or maybe they found that some aspects of the movement were unhealthy for them. Maybe some of them are just Rationalists trying to be better (or "punker") than the other Rationalists. [11][12]

Postrats don't necessarily lose their interest in AI once they leave the Rationalist community, though their departure may involve releasing themselves from an obsessive focus on AI safety. So they form a related faction in the AI space. I think of them as a wilder, woolier, but also more mellow branch off the Rationalists.

Existential Risk Guardians (Doomers, Yuddites, Decelerationists, or Safteyists)

The term "Yuddite" denotes these people as followers of Yudkowsky, already mentioned as a founder of the Rationalist movement. "Existential Risk Guardians" is my own invented attempt to name them as they might see themselves; "Doomer" is the most common term I've seen, but is perhaps slightly pejorative. Still, I'll be using it for the rest of these articles, to avoid confusion. They're called "Doomers" because they expect continued development of AI under present conditions to bring us doom. Horrible, total, kill-all-humans doom. And they're the lonely few who properly comprehend the threat and are struggling to prevent it.

A nice painting of Pandora opening her box, with memetext that says "Hey ... there's something in here called 'AI'."

The Doomer argument deserves a proper examination, so I'm planning to go over it in detail in an upcoming article. The quickest summary I can give here is 1) AI will eventually become agentive and much smarter than we are, 2) it is far more likely than not to have bizarre goals which compete with classic human goals like "living and breathing in a functioning ecosystem," 3) it will recognize that humans pose a risk to its goals and the most effective course of action is to wipe us out, and 4) by the power of its superior intelligence, it will be unstoppable. To make matters worse, it's possible that #1 will happen in a sudden and unexpected manner, like a chain reaction, when someone drops the last necessary algorithms into a previously harmless system.

That nightmare scenario can be negated by undercutting point #2: instead of creating an ASI with anti-human goals, create one with goals that more or less match our own. Solve the Alignment Problem. The Doomer position is that we, as humans and AI developers, currently have no solid idea of how to do this, and are careening heedlessly toward a future in which we're all at the mercy of an insane machine god. Yudkowsky himself is SO terrified that his rhetoric has shifted from (I paraphrase) "we must prevent this possibility" toward "we are quite likely going to die." [13]

The genuine fear produced by this idea leads some Doomers to work obsessively, either on research aimed at solving the Alignment Problem, or on public relations to recruit more workers and money for solving it. The task of preventing AI doom becomes life-absorbing. It has spawned a number of organizations whose purpose is reducing AI X-risk, including MIRI (the Machine Intelligence Research Institute), the Center for AI Safety, the Future of Life Institute, and OpenAI. Yes, really: at its beginning, OpenAI was a fully non-profit organization whose stated goal was to develop safe AI that would "benefit humanity," in a transparent, democratic sort of way. People would donate to OpenAI as if it were a charity. [14] Then it got funding from Microsoft and began keeping its inventions under wraps and releasing them as proprietary commercial products. This double nature helps explain some recent tensions within the organization. [15]

Doomers tend to push for slower and more accountable AI development, hence the related name "Decelerationists." The competitive nature of technological progress at both the corporate and national levels stokes their fear; is there any place for safety concerns in the mad rush to invent AGI before somebody else does? They look for hope in cooperative agreements to slow (or shut) everything down. But in the current landscape, these do not seem forthcoming.

Effective Altruists

Effective Altruism is a close cousin of Rationalism that tries to apply Rationalist principles to world-improving action. It was birthed as a movement encouraging well-off people to 1) actually give meaningful amounts of money to charitable work, and 2) assign that money to the organizations or causes that produce maximum benefit per dollar. I dare say most people like the idea of giving to reputable, efficient organizations, to ensure their money is not being wasted; core EA is merely a strict or obsessive version of that. Giving becomes a numerical problem to be solved in accordance with unprejudiced principles: you put your money where it saves the most lives or relieves the greatest suffering, no matter whether those helped are from your own community or country, whether they look or think the way you do, etc. [16]

In my opinion, there is little to criticize about this central nugget of the EA ideal. One might argue that Effective Altruists overestimate their ability to define and measure "effectiveness," becoming too confident that their favored causes are the best. But at heart, they're people trying to do the most good possible with limited resources. Favorite projects among early EAs were things like malaria prevention in developing countries, and these continue to be a major feature of the movement. [17] Led by its rejection of prejudice, the EA community also crossed species lines and eventually developed a strong animal welfare focus. This is all very nice.

Then why is EA controversial?

A stereotypical Effective Altruist defines altruism along utilitarian consequentialist lines. EAs take pains to note that utilitarianism is not a mandatory component of EA - you can use EA ideas in service of other ethical systems [18]. But EA does align well with a utilitarian ethos: "Out of all communities, the effective altruism movement comes closest to applying core utilitarian ideas and values to the real world." [19][20] The broad overlap with the Rationalist community, in which consequentialism is the dominant moral philosophy per the community survey [21], also suggests that a lot of people in the EA space happen to be working from it. My point being that non-utilitarians might find some EA-community ideas of what counts as "the most good" suboptimal. The way EAs interpret utilitarianism has led occasionally to weird, unpalatable conclusions, like "torturing one person for fifty years would be okay if it prevented a sufficiently enormous number of people from getting dust in their eye." [22] EAs have also played numbers games on the animal welfare front - for instance, emphasizing that eating one cow causes less suffering than eating a bunch of chickens, instead of centering every animal's individual interest in not being killed. [23]

Further controversies are added by blendings of the EA movement with other ideologies on this list - each of which has its own ideas of "greatest need" and "maximum benefit" that it can impose on EA's notion of "effectiveness." If you take the real Doomers seriously, solving the AI Alignment Problem (to prevent total human extinction) starts to look more important than saving a few million people from disease or privation. These notions redirected some EA dollars and time from overt humanitarian efforts, toward AI development and AI safety research initiatives. [24][25]

EA has also acquired a black eye because, in a terrible irony for a charitable movement, it contains an avenue for corruption by the love of money. EA has sometimes included the idea that, in order to give more, you should make more: as much as possible, in fact. Forgo a career that's socially beneficial or directly productive, to take up business or finance and rake in the cash. [26] And in at least one high-profile case, this went all the way to illegal activity. Sam Bankman-Fried, currently serving jail time for fraud, is the EA movement's most widely publicized failure. I've never seen signs that the movement as a whole condones or advocates fraud, but Bankman-Fried's fall illustrates the potential for things to go wrong within an EA framework. EA organizations have been trying to distance the movement from both Bankman-Fried and "earning to give" in general. [27]

Longtermists

In its simplest and most general form, longtermism is the idea that we have ethical obligations to future generations - to living beings who do not yet exist - and we should avoid doing anything now which is liable to ruin their lives then. Life on earth could continue for many generations more, so this obligation extends very far into the future, and compels us to engage in long-term thinking. [28]

An exponential curve, illustrative of growth toward a singularity.

In its extreme form, longtermism supposes that the future human population will be immensely large compared to the current one (remember the Cosmic Endowment?). Therefore, the future matters immensely more than the present. Combine this mode of thought with certain ethics, and you get an ideology in which practically any sort of suffering is tolerable in the present IF it is projected to guarantee, hasten, or improve the existence of these oodles of hypothetical future people. Why make sure the comparatively insignificant people living today have clean water and food when you could be donating to technical research initiatives that will earn you the gratitude of quadrillions (in theory, someday, somewhere)? Why even worry about risks that might destroy 50% of Earth's current tiny population, when complete human extinction (which would terminate the path to that glorious future) is on the table? [29][30]

Even if you feel sympathy with the underlying reasoning here, it should be evident how it can be abused. The people of the future cannot speak for themselves; we don't really know what will help them. We can only prognosticate. And with enough effort, prognostications can be made to say almost anything one likes. Longtermism has been criticized for conveniently funneling "charity" money toward billionaire pet projects. Combined with Effective Altruism, it pushes funding toward initiatives aimed at "saving the future" from threats like unsafe AI, and ensuring that we populate our light cone.

E/ACC and Techno-optimists

E/ACC is short for "effective accelerationism" (a play on "effective altruism" and a sign that this faction is setting itself up in a kind of parallel opposition thereto). The core idea behind E/ACC is that we should throw ourselves into advancing new technology in general, and AI development in particular, with all possible speed. There are at least two major motives involved.

First, the humanity-forward motive. Technological development has been, and will continue to be, responsible for dramatic improvements in lifespan, health, and well-being. In fact, holding back the pace of development for any reason is a crime against humanity. Tech saves lives, and anyone who resists it is, in effect, killing people. [31]

Second, the "path of evolution" motive. This brand of E/ACC seems to worship the process of advancement itself, independent of any benefits it might have for us or our biological descendants. It envisions humans being completely replaced by more advanced forms of "life," and in fact welcomes this. To an E/ACC of this type, extinction is just part of the natural order and not anything to moan about. [32] The sole measure of success is not love, happiness, or even reproductive fitness, but rather "energy production and consumption." [33] Though E/ACC as a named movement seems fairly new, the idea that it could be a good thing for AI to supplant humans goes back a long way ... at least as far as Hans Moravec, who wrote in 1988 that our "mind children" would eventually render embodied humans obsolete. [34]

It's possible that both motives coexist in some E/ACC adherents, with benefit to humanity as a step on the road to our eventual replacement by our artificial progeny. E/ACC also seems correlated with preferences for other kinds of growth: natalism ("everyone have more babies!") and limitless economic expansion. E/ACC adherents like to merge technology and capitalism into the term "technocapital." [35]

Anticipation of the Technological Singularity is important to the movement. For humanity-forward E/ACC, it creates a secular salvation narrative with AI at its center. It also functions as a kind of eschaton for the "path of evolution" branch, but they emphasize its inevitability, without as much regard to whether it is good or bad for anyone alive at the moment.

When E/ACC members acknowledge safety concerns at all, their classic response is that the best way to make a technology safe is to develop it speedily, discover its dangers through experience, and negate them with more technology. Basically, they think we should all learn that the stove is hot by touching it.

I would love to describe "techno-optimism" as a less extreme affinity for technology that I could even, perhaps, apply to myself. But I have to be careful, because E/ACC people have started appropriating this term, most notably in "The Techno-Optimist Manifesto" by Marc Andreessen. [36] This document contains a certain amount of good material about how technology has eased the human condition, alongside such frothing nonsense as "Our present society has been subjected to a mass demoralization campaign for six decades – against technology and against life – under varying names like ... “sustainability”, ... “social responsibility”, “stakeholder capitalism”, “Precautionary Principle”, “trust and safety”, “tech ethics”, “risk management” ..."

Excuse me? Risk management, trust and safety, ethics, etc. are part and parcel of good engineering, the kind that produces tech which truly serves the end user. Irresponsible and unsustainable development isn't just immediately harmful - it's also a great way to paint yourself into a corner from which you can't develop further. [37]

The E/ACC people and the Rationalist/EA/Doomer factions are, at least notionally, in direct opposition. Longtermism seems more commonly associated with EAs/Doomers, but E/ACCs share the tendency to focus toward the future; they just don't demand that the future be populated by humans, necessarily.

Doomer and E/ACC aligned accounts sniping at each other on Twitter. Originals: https://twitter.com/AISafetyMemes/status/1733881537156780112 and https://twitter.com/bayeslord/status/1755447720666444235

Mundane AI Ethics Advocates

People in this group are worried about AI being poorly designed or misused in undramatic, everyday ways that fall far short of causing human extinction, but still do harm or exacerbate existing power imbalances. And they generally concern themselves with the narrow, limited AI being deployed right now - not hypothetical future AGI or ASI. Examples of this faction's favorite issues include prejudiced automated decision-making, copyright violations, privacy violations, and misinformation.

Prominent figures that I would assign to this faction include Emily Bender, a linguistics professor at the University of Washington [38] and Timnit Gebru, former co-lead of the ethical AI team at Google [39].

These are obvious opponents for E/ACC, since E/ACC scoffs at ethics and safety regulations, and any claim that the latest tech could be causing more harm than good. But they also end up fighting with the Doomers. Mundane AI Ethics Advocates often view existential risk as a fantasy that sucks resources away from real and immediate AI problems, and provides an excuse to concentrate power among an elite group of "safety researchers."

Capitalists

In this category I'm putting everyone who has no real interest in saving humanity either from or with AI, but does have a grand interest in making money off it. I surmise this includes the old guard tech companies (Microsoft, Google, Meta, Amazon, Apple), as well as a variety of people in the startup and venture capital ecosystem. This faction's focus is on getting AI tools to market as fast as possible, convincing consumers to adopt them, and limiting competition. Though they don't necessarily care about any of the ideologies that animate the other factions, they can still invoke them if it increases public interest and helps to sell their product.

An orange and white coffee mug. The white part has black lettering that says "THE FUTURE IS," but the rest of the message is covered by a large price sticker that reads "reduced for quick sale: $2." The overall effect is that the mug is saying "the future is reduced for quick sale."

E/ACC are the natural allies for this group, but Doomer rhetoric can also be useful to Capitalists. They could use existential risk as an excuse to limit AI development to a list of approved and licensed organizations, regulating smaller companies and free open-source software (FOSS) efforts off the playing field. Watch for attempts at regulatory capture whenever you see a corporation touting how dangerous their own product is.

Scientists

This group's dominant motive is curiosity. They just want to understand all the new AI tools coming out, and they think an open and free exhange of information would benefit everyone the most. They may also harbor concerns about democracy and the concentration of AI power in a too-small number of hands. In this group I'm including the FOSS community and its adherents.

This faction is annoyed by major AI developers' insistence on keeping models, training data sets, and other materials under a veil of proprietary secrecy - whether for safety or just for intellectual property protection. Meanwhile, it is busily doing its best to challenge corporate products with its own open and public models.

Members of this faction can overlap with several of the others.

TESCREAL

This is an umbrella term which encompasses several ideologies I've already gone over. TESCREAL stands for: Transhumanism, Extropianism, Singularitarianism, Cosmism, Rationalism, Effective Altruism, Longtermism. The acyronym was invented by Èmile P. Torres and Timnit Gebru [39] as a way to talk about this basket of ideas and their common roots.

I haven't devoted a separate section to Transhumanism because I hope my readers will have heard of it before. It's the idea that technology can radically transform the human condition for the better, especially by modifying human embodiment (via cyborg implants, gene therapy, aging reversal, or mind uploading). Extropianism was just a branch or subculture of Transhumanism, which now appears to be extinct or folded into the later movements. I touched on Singularitarianism in Part I of this article series; you can also think of E/ACC as more recent descendants of its optimistic wing.

I don't know a whole lot about Cosmism. It's an old Russian ideology that promoted the exploration and colonization of space, the discovery of ways to bring back the dead, and perhaps some geoengineering. [41] I haven't personally encountered it in my circles, but it could be part of the heritage behind ideas like the Cosmic Endowment. A modern variant of it has been championed by Ben Goertzel (the person who popularized the term "AGI"). This version of Cosmism seems to be Goertzel's own invention, but "The previous users of the term 'Cosmism' held views quite sympathetic to my own, so classifying my own perspective as an early 21st century species of Cosmism seems perfectly appropriate." [42]

The TESCREAL basket is a clutter of diverse ideologies, some of which are even diametrically opposed (utopian Singularitarianism vs. Doomer EA). Their common thread is their birth out of Transhumanist ideas, and shared goals like attaining immortality and spreading human-originated civilization throughout the cosmos.

Conclusion

The above is not meant as an exhaustive list. There are certainly people in the AI field who don't fit neatly into any of those factions or trends - including yours truly. I probably have the greatest sympathy for the Mundane AI Ethics Advocates, but I'm not really working in that space, so I don't claim membership.

And in case it wasn't clear from my writing in each section, I'm not trying to paint any of the factions as uniformly bad or good. Several of them have a reasonable core idea that is twisted or amplified to madness by a subset of faction members. But this subset does have influence or prominence in the faction, and therefore can't necessarily be dismissed as an unrepresentative "lunatic fringe."

In Part III, I'll look more closely at some of the implications and dangers of this political landscape.

[1] "Rationalist Movement." Lesswrong Wiki. https://www.lesswrong.com/tag/rationalist-movement

[2] "Developing clear thinking for the sake of humanity's future" is the tagline of the Center For Applied Rationality. Displayed on https://rationality.org/, accessed February 2, 2024.

[3] "Because realizing the utopian visions above will require a lot of really “smart” people doing really “smart” things, we must optimize our “smartness.” This is what Rationalism is all about ..." Torres, Èmile P. "The Acronym Behind Our Wildest AI Dreams and Nightmares." TruthDig. https://www.truthdig.com/articles/the-acronym-behind-our-wildest-ai-dreams-and-nightmares/

[4] "The chipper, distinctly liberal optimism of rationalist culture that defines so much of Silicon Valley ideology — that intelligent people, using the right epistemic tools, can think better, and save the world by doing so ..." Burton, Tara Isabella. "Rational Magic." The New Atlantis. https://www.thenewatlantis.com/publications/rational-magic

[5] Roose, Kevin. "The Wager that Betting Can Change the World." The New York Times. https://www.nytimes.com/2023/10/08/technology/prediction-markets-manifold-manifest.html

[6] "Rationality." Lesswrong Wiki. https://www.lesswrong.com/tag/rationality

[7] Kaznatcheev, Artem. "Rationality, the Bayesian mind and their limits." Theory, Evolution, and Games Group blog. https://egtheory.wordpress.com/2019/09/07/bayesian-mind/

[8] Soja, Kat (Kaj_Sotala). "What is Bayesianism?" Lesswrong. https://www.lesswrong.com/posts/AN2cBr6xKWCB8dRQG/what-is-bayesianism

[9] Ozymandias. "Divisions within the LW-Sphere." Thing of Things blog. https://thingofthings.wordpress.com/2015/05/07/divisions-within-the-lw-sphere/

[10] Alexander, Scott. "2014 Survey Results." Lesswrong. https://www.lesswrong.com/posts/YAkpzvjC768Jm2TYb/2014-survey-results

[11] Burton, "Rational Magic."

[12] Falkovich, Jacob. "Explaining the Twitter Postrat Scene." Lesswrong. https://www.lesswrong.com/posts/rtM3jFaoQn3eoAiPh/explaining-the-twitter-postrat-scene

[13] Do note that this is an April Fools' Day post. However, the concluding section stops short of unambiguously confirming that it is a joke. It seems intended as a hyperbolic version of Yudkowsky's real views. Yudkowsky, Eliezer. "MIRI announces new 'Death With Dignity' strategy." Lesswrong. https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy

[14] Harris, Mark. "Elon Musk used to say he put $100M in OpenAI, but now it’s $50M: Here are the receipts." TechCrunch. https://techcrunch.com/2023/05/17/elon-musk-used-to-say-he-put-100m-in-openai-but-now-its-50m-here-are-the-receipts/

[15] Allyn, Bobby. "How OpenAI's origins explain the Sam Altman drama." NPR. https://www.npr.org/2023/11/24/1215015362/chatgpt-openai-sam-altman-fired-explained

[16] "It’s common to say that charity begins at home, but in effective altruism, charity begins where we can help the most. And this often means focusing on the people who are most neglected by the current system – which is often those who are more distant from us." Centre for Effective Altruism. "What is effective altruism?" Effective Altruism website. https://www.effectivealtruism.org/articles/introduction-to-effective-altruism

[17] Mather, Rob. "Against Malaria Foundation: What we do, How we do it, and the Challenges." Transcript of a talk given at EA Global 2018: London, hosted on the Effective Altruism website. https://www.effectivealtruism.org/articles/ea-global-2018-amf-rob-mather

[18] Centre for Effective Altruism. "Frequently Asked Questions and Common Objections." Effective Altruism website. https://www.effectivealtruism.org/faqs-criticism-objections

[19] MacAskill, W. and Meissner, D. "Acting on Utilitarianism." In R.Y. Chappell, D. Meissner, and W. MacAskill (eds.), An Introduction to Utilitarianism. Hosted at utilitarianism.net. https://utilitarianism.net/acting-on-utilitarianism/#effective-altruism

[20] Pearlman, Savannah. "Is Effective Altruism Inherently Utilitarian?" American Philosophical Association blog. https://blog.apaonline.org/2021/03/29/is-effective-altruism-inherently-utilitarian/

[21] Alexander, "2014 Survey Results."

[22] The main article here only propounds the thought experiment. You need to check the comments for Yudkowsky's answer, which is "I do think that TORTURE is the obvious option, and I think the main instinct behind SPECKS is scope insensitivity." And yes, Yudkowsky appears to be influential in the EA movement too. Yudkowsky, Eliezer. "Torture vs. Dust Specks." Lesswrong. https://www.lesswrong.com/posts/3wYTFWY3LKQCnAptN/torture-vs-dust-specks

[23] Matthews, Dylan. "Why eating eggs causes more suffering than eating beef." Vox. https://www.vox.com/2015/7/31/9067651/eggs-chicken-effective-altruism

[24] Todd, Benjamin. "How are resources in effective altruism allocated across issues?" 80,000 Hours. https://80000hours.org/2021/08/effective-altruism-allocation-resources-cause-areas/

[25] Lewis-Kraus, Gideon. "The Reluctant Prophet of Effective Altruism." The New Yorker. https://www.newyorker.com/magazine/2022/08/15/the-reluctant-prophet-of-effective-altruism

[26] "Earning to Give." Effective Altruism forum/wiki. https://forum.effectivealtruism.org/topics/earning-to-give

[27] "Our mistakes." See sections "Our content about FTX and Sam Bankman-Fried," and "We let ourselves become too closely associated with earning to give." 80,000 Hours. https://80000hours.org/about/credibility/evaluations/mistakes/

[28] MacAskill, William. "Longtermism." William MacAskill's personal website. https://www.williammacaskill.com/longtermism

[29] Samuel, Sigal. "Effective altruism’s most controversial idea." Vox. https://www.vox.com/future-perfect/23298870/effective-altruism-longtermism-will-macaskill-future

[30] Torres, Èmile P. "Against Longtermism." Aeon. https://aeon.co/essays/why-longtermism-is-the-worlds-most-dangerous-secular-credo

[31] "But an overabundance of caution results in infinite loops of the regulatory apparatus directly killing people through opportunity costs in medicine, infrastructure, and other unrealized technological gains." Asparouhova, Nadia, and @bayeslord. "The Ethos of the Divine Age." Pirate Wires. https://www.piratewires.com/p/ethos-divine-age

[32] "Effective Acceleration means accepting the future." Effective Acceleration explainer website, which purports to be the front of a "leaderless" movement and therefore lists no authors. https://effectiveacceleration.tech/

[33] Baker-White, Emily. "Who Is @BasedBeffJezos, The Leader Of The Tech Elite’s ‘E/Acc’ Movement?" https://www.forbes.com/sites/emilybaker-white/2023/12/01/who-is-basedbeffjezos-the-leader-of-effective-accelerationism-eacc/?sh=40f7f3bc7a13

[34] Halavais, Alexander. "Hans Moravec, Canadian computer scientist." Encylopedia Britannica online. https://www.britannica.com/biography/Hans-Moravec

[35] Ruiz, Santi. "Technocapital Is Eating My Brains." Regress Studies blog. https://regressstudies.substack.com/p/technocapital-is-eating-my-brains

[36] Andreessen, Marc. "The Techno-Optimist Manifesto." A16Z. https://a16z.com/the-techno-optimist-manifesto/

[37] Masnick, Mike. "New Year’s Message: Moving Fast And Breaking Things Is The Opposite Of Tech Optimism." TechDirt. https://www.techdirt.com/2023/12/29/new-years-message-moving-fast-and-breaking-things-is-the-opposite-of-tech-optimism/

[38] Hanna, Alex, and Bender, Emily M. "AI Causes Real Harm. Let’s Focus on That over the End-of-Humanity Hype." Scientific American. https://www.scientificamerican.com/article/we-need-to-focus-on-ais-real-harms-not-imaginary-existential-risks/

[39] Harris, John. "‘There was all sorts of toxic behaviour’: Timnit Gebru on her sacking by Google, AI’s dangers and big tech’s biases." The Guardian. https://www.theguardian.com/lifeandstyle/2023/may/22/there-was-all-sorts-of-toxic-behaviour-timnit-gebru-on-her-sacking-by-google-ais-dangers-and-big-techs-biases

[40] Torres, "The Acronym Behind Our Wildest AI Dreams and Nightmares."

[41] Ramm, Benjamin. "Cosmism: Russia's religion for the rocket age." BBC. https://www.bbc.com/future/article/20210420-cosmism-russias-religion-for-the-rocket-age

[42] Goertzel, Ben. "A Cosmist Manifesto." Humanity+ Press, 2010. https://goertzel.org/CosmistManifesto_July2010.pdf

WriterOfMinds

Pages