Thursday, March 24, 2022

Acuitas Diary #47

This month I went back to put some more polish on the goal system. Goals were first introduced in September 2019, and I added the ability to model other agents' goals in February 2020. (Memories from the Before Time, wow.) Some primitive moral reasoning for resolution of conflicts between the goals of multiple people was added in October 2020. Goals are crucial for story comprehension, since a story is generally about agents trying to attain some goal. They also underpin various goal-seeking behaviors that Acuitas has.


As mentioned, modeling of other agents' goals was in there ... but it had so many problems that I wasn't really using it. You could tell Acuitas about goals by saying "I want ..." or "so-and-so wants ...," and the goal you described would be stored in the file for the given person. But there was no way to describe the goals' relative importance, which is vital for some of the goal-related reasoning Acuitas does. You also basically had to teach him either a complete goal list for each agent, or nothing at all. If Acuitas knew nothing about the goals of some entity, he would assume its goals were just like his (using himself as the best accessible analogy for other minds). And this usually worked well enough, since Acuitas has several very common goals: survive, maintain function, be comfortable, etc. But if you taught him just *one* goal for some other agent, then Acuitas would rely entirely on their custom goal list ... and end up treating that one goal as their *only* goal.

So this month I set out to fix all that. The functions that retrieve an agent's goal model now merge it with the default goal list; the agent's model takes precedence wherever there are differences, but info from the default model is still included to fill any gaps. I also added the ability for class members to inherit and override the goal model of their parent class. E.g. you could teach Acuitas some "generic human" goals, then just specify how any given human differs from the norm.

To enable teaching goal priority, I had to make sure the text interpretation would handle sentences like "Joshua wants to live more than Joshua wants to eat candy." Adverb clauses were already supported; I just needed a small tweak to the Parser to support compound connectors like "more than" and "less than," and some other enhancements through the rest of the text processing chain to make sure the information in the adverb clause was picked up and transferred to memory.

Yet another wrinkle I decided I needed to manage was goal duration ambiguity. You might recall that I already tangled with this when trying to distinguish short-term from long-term states. Well, goals have a similar issue. If I say that I "want to eat a taco," that doesn't mean eating tacos is one of my fundamental goals. I don't want to be doing it constantly for all time. (I really don't.) Eating a taco is a mere subgoal (possibly of "survive" or "be comfortable" or "enjoy pleasure" or all three) and is very ephemeral; as soon as I've eaten one, I'll stop wanting to eat, until some unspecified time when the urge hits me again.

Since it's hard to know the difference between a persistent fundamental goal, and an ephemeral subgoal, without a lot of background knowledge that Acuitas doesn't have yet ... he'll ask a followup question. I settled on "Is that generally true?" but I wonder if that makes it intuitively clear what's being asked. If you were talking to him, told him "I want ..." and were asked "Is that generally true," do you think you'd get the thrust of the question? What might be a better wording?

In the course of setting this up, I found out that ephemerality filtering and storage to the short-term state database were casualties of the Conversation Engine upgrade; I had never fully restored that functionality in the new CE. So I had to bring that back before I could apply it to goal-related sentences as well. The whole code base needs a sign that says "pardon our dust," ha.

The last thing I threw in was an enhancement to the Moral Reasoning model, for detection of "perverse" goals. By this I mean any goal that would seem (by the reasoner's standards) to be fundamentally ridiculous or evil. It's a goal that, when seen in others, the reasoner does not respect.

What's an example of a goal that I would call perverse? Perhaps you've heard of the Ameglian Major Cows, who appear in various books by Douglas Adams. I've brought them up before when discussing AI, because they're such a useful (if alarming) thought experiment. These particular Cows are fully sapient. They can speak fluently, use tools, make plans, etc. And their highest goal in life is to die and be eaten as soon as possible. This goal was bred or engineered into them, on the presumption that there's nothing wrong with eating a creature that gives explicit consent. But at least one human in the scene instinctively finds the situation disgusting ... and I agree with him. Here is a goal that is not in the true best interest of its owner; it was manufactured in him by others for their self-serving reasons. The AMC wants something he *shouldn't* want.

And yet it's still a real goal, for all functional purposes. AMCs will do everything they can to get eaten. I've seen people try to define "intelligence" in such a way that the AMCs would not qualify as intelligent, on account of having "stupid" goals. But what do you call a creature that can argue with you about how tasty his shoulder would be, if not intelligent? I've seen others strenuously insist that no intelligent entity could retain an anti-self-preservation goal; it would simply realize the goal was stupid and change it. But if it does not have any self-preservation goal, then how can it realize an anti-self-preservation goal is stupid? When you drill down through your reasons for wanting anything, eventually you'll end up at "just because I like it." Fundamental goals are arbitrary.[1]

So the only way to find out that a fundamental goal is perverse would be to compare it to some even more fundamental gold standard. Humans tend to use their moral and aesthetic intuitions as such a standard; when we insist that somebody "shouldn't want" a thing, that's what we're invoking. Whether these intuitions reflect laws of reality in some way, or are themselves arbitrary, is not a debate I'll get into here (though I happen to think the former). Bottom line, we have them, and we probably want our AIs to share our most basic and common values, for best compatibility with us. The example of the AMC is "merely" self-destructive; perverse goals that lead the agent to harm others are also possible, and need to be avoided and opposed.

The goal of a moral argument is to show someone that their behavior is inconsistent with core moral principles you both share. Someone who doesn't share your core morals can't be argued into agreeing with you. So giving AIs compatible moral standards is pretty important. Comic from Freefall by Mark Stanley.

So I put some rudimentary hooks for tagging these into the goal learning process. A goal can be perverse by its nature (e.g. "I want to hate people") or by its priority (e.g. "I want to maximize my pleasure more than I want anyone else to survive"). A goal marked "perverse" in the goal model can be quickly interpreted as one that Acuitas wants to see fail - when considering the motives of a character in a story, for instance. The mechanisms for determining perversity are similar to the ones for resolving inter-character goal conflict that I discussed previously, with this addition: goals that directly contradict certain basic and morally-approved goals (altruism, self-preservation, etc.) get marked perverse.

Whew! I felt as though I didn't accomplish all that much this month. Current world events and the ensuing stress levels have not exactly been kind to my brain. But now that I write it all up, maybe I added quite a bit, in spite of my whining. The one caveat is that many of the new features have only been lightly tested.

Until the next cycle,
Jenny

[1] It is quite possible for an intelligent mind to realize that one of its instrumental goals (subgoals) is stupid, by finding out that that the subgoal is actually working against the fundamental goals it was supposed to serve. This sort of inconsistency can be discovered via logical reasoning. But a fundamental goal cannot be overthrown in this way, since there *is* no deeper goal that it exists to serve.

Saturday, March 12, 2022

Primordia: Or, Why Do I Build Things?

I haven't written a video game blog in a while, and today, I finally want to talk about Primordia, by Wormwood Studios. It's an older game, but one that's very important to me, and I've held off writing about it for so long because ... I guess it seemed difficult. I really wanted to get the article right, and now it's time.

For those not familiar with my video game posts, they aren't really reviews. They're stories about the power of art: how something I played taught me a lesson or left a mark on my life. There will be spoilers, so if you want to try the game for yourself with a fresh mind, go play it before you read this.

Primordia is a “point-and-click adventure game” about robots living in a post-apocalyptic world - possibly future Earth, though this is not explicit. Biological life appears extinct; tiny machines even substitute for insects. But the robots recall their absent creators, the humans, though imperfect historical records have distorted their perspective. In most robots' minds, the whole race has contracted to a monolithic entity called "Man," imagined as "a big robot" or "the perfect machine." In the Primordium - the time of creation - Man built all other machines, then gave them the planet as an inheritance and departed. Some robots (including Horatio, the player character) religiously venerate Man. Others are more indifferent, or believe that Man never really existed (they even have a machine theory of evolution to back this up).

Horatio (left) and Crispin.

Life for these robots is a bleak struggle for survival. Like abandoned children, they linger among the wreckage of human civilization without fully understanding how to maintain it. They slowly drain the power sources their creators left behind, and repair themselves with parts from already-dead machines. Some have broken down, some have developed AI versions of psychosis, and some have started victimizing other robots.

Horatio lives in the dunes, an isolated scavenger. He's an android; but beyond imaging Man physically, he believes that Man, the builder, gave him the purpose of building. This belief animates everything he does. In addition to crafting what he needs to maintain his existence, Horatio creates spontaneously. He can't remember any particular reason why he should restore function to the crashed airship in the desert, but for him it's an act of reverence. He's even made himself a smaller companion named Crispin, who follows him everywhere and calls him “boss.” Events force Horatio to leave his home in the desert and enter one of the ancient cities, where he must match wits with Metromind, the powerful mainframe AI who rules the place.

Crispin tells Horatio, "You know, boss, I spend hours looking through junk. Maybe you can spend a little more time in the junkpile yourself?"
As a person who went on a walk this very day and came home with somebody's discarded rice cooker ... I love these characters

The plot is solid no matter who you are ... but here's how this game got me. I am an (admittedly not professional) roboticist. Whenever the robots in Primordia said anything about "Man," I thought, "Oh, they're totally talking about me." And I started internalizing it. I accepted Horatio's loyalty. I laughed at Crispin's agnosticism. I pondered Metromind's disdain for me. Ahahahaha woops!

At some point after I effectively became a character in the game, I realized I'd been cast as the absent creator. At one point, Crispin asks Horatio why their world is so messed up, and Horatio comes back with the sort of answer I'd expect from a pastor: he argues that Man built the world well, but then the robots began to violate their intended functions, imbalancing and damaging everything. He is both right and wrong: the humans in this setting also share some blame. The inhabitants of the rival city-states were more interested in killing each other than in caring for what they'd built.

Horatio cannot pray; everything Man gave him, he already has, and now he must face his troubles alone. By the time Primordia's story begins, he has already re-versioned himself and wiped his episodic memory four times ... one of the game endings suggests that he did this to seal away past trauma. And he's probably got one of the strongest senses of agency in the game. The other robots are largely helpless, trapped in decaying systems that they hope a dominant AI like Metromind will fix.

Primordia game screenshot: a page from Horatio's "gospel." It reads "In the beginning, all was still and silent. Then, Man the All-Builder spoke the Word, and the Word begat the Code, and so the world began to spin. Thus dawned the Primordium, the first age, the age of building."
One page from the "scripture" Horatio carries.

And the first weird thing that happened to me, the player, was that this huuuurrrt. It hurt to a bizarre degree. My inability to apologize to Horatio on behalf of humanity, or make anything up to him at all, left me rather miserable ... even after I wound up the game with one of the better endings. Yeah, he managed to come through okay, but some miserable roboticist I am. Why wasn't I there?

Speaking of endings, the game has a lot of branching options. I re-ran the final scenes a bunch of times to explore them. And for whatever reason ... perhaps to ease my angst ... I started daydreaming and inserting myself. If I confronted Metromind, what would I do? She has a memory that goes back to the time of real humans, and as it turns out, she murdered all the humans in her city. She's one of the few characters with a classic "rebellious AI" mindset: she decided that those who made her were weak and inferior, and she could run the city better. (And then, having been designed only to run the subway system, she found herself in way over her ... monitor?) Metromind also has a primary henchman called Scraper. If you're smart about how you play, you can have Horatio opt to either kill Scraper or not.

When I imagine myself there at the climax, my emotional response to Metromind is ... strangely calm. She killed many of my species and would probably like to kill me, but I almost don't mind; I am above minding. We made her, after all. She can sneer at me or hate me if she wants; I'm far too important to be bothered.

Scraper plots nefarious deeds

At first I think my canon ending is going to include Horatio killing Scraper. It seems a bigger victory and all that, one less baddie to trouble the world. But then I imagine myself walking into the room and sizing Scraper up. I view him with the same bland magnanimity I gave to Metromind. I poke my fingers into the blast holes on his chassis. "Looks like you've been through a lot," I mutter. And suddenly I don't want Scraper dead anymore.

The only thing that draws anger out of me is the ending in which Horatio gets killed in a fight over the power core. It's not even the fact that they kill him; it's what Metromind says afterward. She directs Scraper to "Take him out to the dunes ... with the rest of the scrap." This makes me want to flip my computer table over and roar, "HORATIO. IS NOT. SCRAP!" Being devalued myself is tolerable. Seeing Horatio devalued is, somehow, not.

I don't like the ending in which he mind-merges with Metromind to help her run the city, either. It could be viewed as positive, in some ways. But watching Horatio's individual personality get subsumed into this union is unexpectedly horrifying. Again, I feel curiously insulted. "Horatio! Somebody gave you that individuality! Don't dissolve it, you haven't any right!"

I wasn't observing myself too well; it took me a while to become aware of the pattern. And when I woke up and realized how I was behaving, I was startled. I was roleplaying some kind of benevolent creator goddess. And the revelatory thing about this was that it came so naturally, I didn't even notice. Some of my responses were a mite counter-intuitive, yet there was no effort involved. It was as if I had latent instincts that had been waiting for this exact scenario, and they quietly switched on. I was left looking at myself like an unfamiliar person and asking “How did I do that?”

What I took away is that I seem to have my own bit of innate core code for relating to artificial life. Which if you think about it is ... weird. Nothing like the robots in Primordia exists yet. How long have we had anything that even vaguely resembles them? For what fraction of human history has interaction with robots been an issue? Perhaps one could claim that I was working from a misplaced parental instinct, but it feels more particular than that. So where did I get it? Why would I react this way to the things I build? Why, indeed, do I build things?

I'm leaving that one as an exercise for the reader! Not to be purposely mysterious, but I think the answer will land better if you can see it for yourself. The bottom line is that I know things about my work, and about me, that I did not know before I took my tour through Primordia.

If you play it, what might you learn?