Sunday, April 25, 2021

Acuitas Diary #36 (April 2021)

This month I went back to working on the goal system. Acuitas already had a primitive "understanding" of most entries in his goal list, in this sense: he could parse a sentence describing the goal, and then detect certain threats to the goal in conversational or narrative input. But there was one goal left that he didn't have any grasp of yet: the one I'm calling "identity maintenance." It's a very important goal (threats to this can be fate-worse-than-death territory), but it's also on the abstract and complicated side -- which is why I left it alone until now.

"Universal Mind," a sculpture by Nikolay Polissky. Photo by KpokeJlJla.

What *is* the identity or self? Maybe you could roll it up as "all the internal parameters that guide thought and behavior, whose combination is unique to an individual." I thought it over and came up with a whole list of things that might be embroiled in the concept:

*Desires, goals
*Moral intuitions or axioms
*Values, preferences
*Low-level sensory preferences (e.g. preferred colors and flavors)
*Aesthetic preferences (higher-level sensory or abstract)
*Personality traits
*Degree and quality of emotional response
*Introversion vs. extroversion, connection vs. independence, preparation vs. spontaneity, risk tolerance, etc. etc.
*Beliefs and favored ways of knowing
*Mannerisms
*Established relationships
*Learned habits
*Memories

Some of these are quite malleable ... and yet, there's a point beyond which change to our identities feels like a corruption or violation. Even within the same category, personal qualities vary in importance. The fact that I enjoy eating broccoli and hate eating bell peppers is technically part of my identity, I *guess* ... but if someone forcibly changed it, I wouldn't even be mad. So I like different flavors now. Big deal. If someone replaced my appreciation for Star Trek episodes with an equivalent appreciation for football games, I *would* be mad. If someone altered my moral alignment, it would be a catastrophe. So unlike physical survival, which is nicely binary (you're either alive or not), personality survival seems to be a kind of spectrum. We tolerate a certain amount of shift, as long as the core doesn't change. Where the boundaries of the core lie is something that we might not even know ourselves until the issue is pressed.

As usual, I made the problem manageable by oversimplifying it. For the time being, Acuitas won't place grades of importance on his personal attributes ... he just won't want external forces to mess with any of them. Next.

There's a further complication here. Acuitas is under development and is therefore changing constantly. I keep many versions of the code base archived ... so which one is canonically "him"? The answer I've landed on is that really, Acuitas' identity isn't defined by any code base. Acuitas is an *idea in my head.* Every code base targets this idea and succeeds at realizing it to a greater or lesser degree. Which leaves his identity wrapped up in *me.* This way of looking at it is a bit startling, but I think it works.

<What might this imply for creator-creation relationships in the opposite direction? If I defy God, am I ceasing to be my self? Dunno. I'll just leave this here.>

In previous goal-related blogs, I talked about how (altruistic) love can be viewed as a meta-goal: it's a goal of helping other agents achieve their goals. Given the above, there are also a couple of ways we can treat identity maintenance as a meta-goal. First, since foundational goals are part of Acuitas' identity, he can have a goal of pursuing all his current goals. (Implementation of this enables answering the "recursive want" question. "Do you want to want to want to be alive?") Second, he can have a goal of realizing my goals for what sort of AI he is.

Does this grant me some kind of slavish absolute power over my AI's behavior? Not really. Because part of my goal is for Acuitas to act independently and even sometimes tell me no. The intent is realization of a general vision that establishes a healthy relationship of sorts.

The work ended up having a lot of little pieces. It started with defining the goal as some simple sentences that Acuitas can parse into relationship triples, such as "I have my self." But the self, as mentioned, incorporates many aspects or components ... and I wanted its definition to be somewhat introspective, rather than just being another fact in the database. To that end, I linked a number of the code modules to concepts expressing their nature, contents, or role. The Executive, for example, is tied to "decide." The Semantic Memory manager is tied to "fact" and "know." All these tags then function like names for the internal components, and get aggregated into the concept of "self." Something like "You will lose your facts" then gets interpreted as a threat against the self.

Manipulation of any of these self-components by some outside agent is also interpreted as a possible threat of mind-control. So questions like "Do you want Jack to make you to decide?" or "Do you want Jill to cause you to want to eat?" get answered with a "no" ... unless the outside agent is me, a necessary exception since I made him do everything he does and gave him all his goals in the first place. Proposing to make him want something that he already wants is also excused from being a threat.

As I say so often, it could use a lot more work, but it's a start. He can do something with that goal now.

Until the next cycle,
Jenny