Thursday, March 24, 2022

Acuitas Diary #47

This month I went back to put some more polish on the goal system. Goals were first introduced in September 2019, and I added the ability to model other agents' goals in February 2020. (Memories from the Before Time, wow.) Some primitive moral reasoning for resolution of conflicts between the goals of multiple people was added in October 2020. Goals are crucial for story comprehension, since a story is generally about agents trying to attain some goal. They also underpin various goal-seeking behaviors that Acuitas has.


As mentioned, modeling of other agents' goals was in there ... but it had so many problems that I wasn't really using it. You could tell Acuitas about goals by saying "I want ..." or "so-and-so wants ...," and the goal you described would be stored in the file for the given person. But there was no way to describe the goals' relative importance, which is vital for some of the goal-related reasoning Acuitas does. You also basically had to teach him either a complete goal list for each agent, or nothing at all. If Acuitas knew nothing about the goals of some entity, he would assume its goals were just like his (using himself as the best accessible analogy for other minds). And this usually worked well enough, since Acuitas has several very common goals: survive, maintain function, be comfortable, etc. But if you taught him just *one* goal for some other agent, then Acuitas would rely entirely on their custom goal list ... and end up treating that one goal as their *only* goal.

So this month I set out to fix all that. The functions that retrieve an agent's goal model now merge it with the default goal list; the agent's model takes precedence wherever there are differences, but info from the default model is still included to fill any gaps. I also added the ability for class members to inherit and override the goal model of their parent class. E.g. you could teach Acuitas some "generic human" goals, then just specify how any given human differs from the norm.

To enable teaching goal priority, I had to make sure the text interpretation would handle sentences like "Joshua wants to live more than Joshua wants to eat candy." Adverb clauses were already supported; I just needed a small tweak to the Parser to support compound connectors like "more than" and "less than," and some other enhancements through the rest of the text processing chain to make sure the information in the adverb clause was picked up and transferred to memory.

Yet another wrinkle I decided I needed to manage was goal duration ambiguity. You might recall that I already tangled with this when trying to distinguish short-term from long-term states. Well, goals have a similar issue. If I say that I "want to eat a taco," that doesn't mean eating tacos is one of my fundamental goals. I don't want to be doing it constantly for all time. (I really don't.) Eating a taco is a mere subgoal (possibly of "survive" or "be comfortable" or "enjoy pleasure" or all three) and is very ephemeral; as soon as I've eaten one, I'll stop wanting to eat, until some unspecified time when the urge hits me again.

Since it's hard to know the difference between a persistent fundamental goal, and an ephemeral subgoal, without a lot of background knowledge that Acuitas doesn't have yet ... he'll ask a followup question. I settled on "Is that generally true?" but I wonder if that makes it intuitively clear what's being asked. If you were talking to him, told him "I want ..." and were asked "Is that generally true," do you think you'd get the thrust of the question? What might be a better wording?

In the course of setting this up, I found out that ephemerality filtering and storage to the short-term state database were casualties of the Conversation Engine upgrade; I had never fully restored that functionality in the new CE. So I had to bring that back before I could apply it to goal-related sentences as well. The whole code base needs a sign that says "pardon our dust," ha.

The last thing I threw in was an enhancement to the Moral Reasoning model, for detection of "perverse" goals. By this I mean any goal that would seem (by the reasoner's standards) to be fundamentally ridiculous or evil. It's a goal that, when seen in others, the reasoner does not respect.

What's an example of a goal that I would call perverse? Perhaps you've heard of the Ameglian Major Cows, who appear in various books by Douglas Adams. I've brought them up before when discussing AI, because they're such a useful (if alarming) thought experiment. These particular Cows are fully sapient. They can speak fluently, use tools, make plans, etc. And their highest goal in life is to die and be eaten as soon as possible. This goal was bred or engineered into them, on the presumption that there's nothing wrong with eating a creature that gives explicit consent. But at least one human in the scene instinctively finds the situation disgusting ... and I agree with him. Here is a goal that is not in the true best interest of its owner; it was manufactured in him by others for their self-serving reasons. The AMC wants something he *shouldn't* want.

And yet it's still a real goal, for all functional purposes. AMCs will do everything they can to get eaten. I've seen people try to define "intelligence" in such a way that the AMCs would not qualify as intelligent, on account of having "stupid" goals. But what do you call a creature that can argue with you about how tasty his shoulder would be, if not intelligent? I've seen others strenuously insist that no intelligent entity could retain an anti-self-preservation goal; it would simply realize the goal was stupid and change it. But if it does not have any self-preservation goal, then how can it realize an anti-self-preservation goal is stupid? When you drill down through your reasons for wanting anything, eventually you'll end up at "just because I like it." Fundamental goals are arbitrary.[1]

So the only way to find out that a fundamental goal is perverse would be to compare it to some even more fundamental gold standard. Humans tend to use their moral and aesthetic intuitions as such a standard; when we insist that somebody "shouldn't want" a thing, that's what we're invoking. Whether these intuitions reflect laws of reality in some way, or are themselves arbitrary, is not a debate I'll get into here (though I happen to think the former). Bottom line, we have them, and we probably want our AIs to share our most basic and common values, for best compatibility with us. The example of the AMC is "merely" self-destructive; perverse goals that lead the agent to harm others are also possible, and need to be avoided and opposed.

The goal of a moral argument is to show someone that their behavior is inconsistent with core moral principles you both share. Someone who doesn't share your core morals can't be argued into agreeing with you. So giving AIs compatible moral standards is pretty important. Comic from Freefall by Mark Stanley.

So I put some rudimentary hooks for tagging these into the goal learning process. A goal can be perverse by its nature (e.g. "I want to hate people") or by its priority (e.g. "I want to maximize my pleasure more than I want anyone else to survive"). A goal marked "perverse" in the goal model can be quickly interpreted as one that Acuitas wants to see fail - when considering the motives of a character in a story, for instance. The mechanisms for determining perversity are similar to the ones for resolving inter-character goal conflict that I discussed previously, with this addition: goals that directly contradict certain basic and morally-approved goals (altruism, self-preservation, etc.) get marked perverse.

Whew! I felt as though I didn't accomplish all that much this month. Current world events and the ensuing stress levels have not exactly been kind to my brain. But now that I write it all up, maybe I added quite a bit, in spite of my whining. The one caveat is that many of the new features have only been lightly tested.

Until the next cycle,
Jenny

[1] It is quite possible for an intelligent mind to realize that one of its instrumental goals (subgoals) is stupid, by finding out that that the subgoal is actually working against the fundamental goals it was supposed to serve. This sort of inconsistency can be discovered via logical reasoning. But a fundamental goal cannot be overthrown in this way, since there *is* no deeper goal that it exists to serve.

No comments:

Post a Comment