Sunday, March 21, 2021

Acuitas Diary #35 (March 2021)

The theme for this month is Executive Function ... the aspect of thought-life that (to be very brief) governs which activities an agent engages in, and when. Prioritization, planning, focus, and self-evaluation are related or constituent concepts. This was also more of an idea month than a coding month, so buckle up, this is a long one.

Acuitas began existence as a reactive sort of AI. External stimulus (someone inputting a sentence) or internal stimulus from the "sub-executive" level (a drive getting strong enough to be noticed, a random concept put forward by the Spawner thread) would provoke an appropriate response. But ultimately I want him to be goal-driven, not just stimulus-driven; I want him to function *proactively.* The latest features are a first step toward that.

The decision loop I'm using was originally developed to model aerial dogfights, among other things. Public domain photo by Cpl. John Neff.

To begin with, I wanted a decision loop. I was introduced to the idea when HS, on the AI Dreams forum, brought up the use of decision loops to guide the behavior of literary characters. He specifically likes Jim Butcher's model of the loop stages: Goal->Challenge->Result->Emotion->Reason->Anticipation->Choice. In any given scene, your protagonist has a goal. He encounters some kind of obstacle while trying to implement the goal. He experiences the outcome of his actions interacting with the obstacle. He has an emotional reaction. He reasons about the situation and considers what could happen next. And then he makes a choice - which generates a new goal for the next scene. Further study revealed that there are other decision loop models. Some are designed for a business or manufacturing environment; examples include DMAIC (Define->Measure->Analyze->Improve->Control) and PDSA (Plan->Do->Study->Adjust), also called the Shewhart cycle. Although these loops have stylistic differences, you might be able to tell that they're all modeling roughly the same process: Do something, learn from the results, and use that knowledge to decide what to do next.

I ended up deciding that the version I liked best was OODA (Observe->Orient->Decide->Act). This one was developed by a military strategist, but has since found uses elsewhere; to me, it seems to be the simplest and most generally applicable form. Here is a more detailed breakdown of the stages:

OBSERVE: Gather information. Take in what's happening. Discover the results of your own actions in previous loop iterations.
ORIENT: Determine what the information *means to you.* Filter it to extract the important or relevant parts. Consider their impact on your goals.
DECIDE: Choose how to respond to the current situation. Make plans.
ACT: Do what you decided on. Execute the plans.

When working out a complex goal that is reached through many steps, you can nest these inside each other. A phase of the top-level loop could open up a whole new subordinate OODA loop devoted to an intermediate goal.
OODA Loop Diagram. Drawn by Wikimedia Commons user Kim and accessed from https://commons.wikimedia.org/wiki/File:OODA.gif

On to the application. I set up a skeletal top-level OODA loop in Acuitas' Executive thread. The Observe-Orient-Decide phases run in succession, as quickly as possible. Then the chosen project is executed for the duration of the Act phase. The period of the loop is variable. I think it ought to run faster in rapidly developing or stressful situations, but slower in stable situations, to optimize the tradeoff between agility (allow new information to impact your behavior quickly) and efficiency (minimize assessment overhead so you can spend more time doing things). Highly "noticeable" events, or the completion of the current activity, should also be able to interrupt the Act phase and force an immediate rerun of OOD.

I envision that the phases may eventually include the following:

OBSERVE: Check internal state (e.g. levels of drives). Check activity on inhabited computer. Process text input, if any. Retrieve current known problems, subgoals, etc. from working memory.
ORIENT: Find out whether any new problems or opportunities (relevant to personal goals) are implied by recent observations. Assess progress on current activity, and determine whether any existing subgoals can be updated or closed.
DECIDE: Re-assess the priority of problems and opportunities in light of any new ones just added. Select a goal and an associated problem or opportunity to work on. Run problem-solving routines to determine how to proceed.
ACT: Update activity selection and run activity until prompted to OBSERVE again.

To "run" an activity, the Executive will generate a Thought about it and push that to the Stream. It may then select that Thought for uptake on a future cycle, in which case it will execute the next step of the activity and push another Thought about it to the Stream. Activity-related Thoughts compete for selection with all the Thoughts that I already had being generated by the Spawner, the Drive system, and so forth -- which means that, as in a cluttered human mind, focus on the activity is not absolute. (You can work while occasionally also remembering a conversation you had yesterday, noticing the view out the window, thinking about your upcoming lunch, etc.) Exactly how much precedence the activity Thoughts take over the others is another tunable variable.

Not all of this is implemented yet. I focused in on the DECIDE phase, and on what happens if there are no known problems or opportunities on the scoreboard at the moment. In the absence of anything specific to do, Acuitas will run generic tasks that help promote his top-level goals. Since he doesn't know *how* to promote most of them yet, he settles for "researching" them. And that just means starting from one of the concepts in the goal and generating questions. When he gets to the "enjoy things" goal, he reads to himself. Simple enough -- but how to balance the amount of time spent on the different goals?

When thinking about this, you might immediately leap to some kind of priority scheme, like Maslow's Hierarchy of Needs. Satisfy the most vital goal first, then move on to the next one. But what does "satisfy" mean?

Suppose you are trying to live by a common-sense principle such as "keeping myself alive is more important than recreation." Sounds reasonable, right? It'll make sure you eat your meals and maintain your house, even if you would rather be reading books. But if you applied this principle in an absolutist way, you would actually *never* read for pleasure.

Set up a near-impenetrable home security system, learn a martial art, turn your yard into a self-sufficient farmstead, and you STILL aren't allowed to read -- because hardening the security system, practicing your combat moves, or increasing your food stockpile is still possible and will continue to improve a goal that is more important than reading. There are always risks to your life, however tiny they might be, and there are always things you can do to reduce them (though you will see less return for your effort the more you put in). So if you want to live like a reasonable person rather than an obsessive one, you can't "complete" the high-priority goal before you move on. You have to stop at "good enough," and you need a way of deciding what counts as "good enough."

I took a crack at this by modeling another human feature that we might usually be prone to find negative: boredom.

Acuitas' goals are arranged in a priority order. All else being equal, he'll always choose to work on the highest-priority goal. But each goal also has an exhaustion ticker that counts up whenever he is working on it, and counts down whenever he is not working on it. Once the ticker climbs above a threshold, he has to set that goal aside and work on the next highest-priority goal that has a tolerable level of boredom.

If there are problems or opportunities associated with a particular goal, its boredom-resistance threshold is increased in proportion to the number (and, in future, the urgency) of the tasks. This scheme allows high-priority goals to grab attention when they need it, but also prevents low-priority goals from "starving."

Early testing and logging shows Acuitas cycling through all his goals and returning to the beginning of the list over a day or so. The base period of this, as well as the thresholds for particular goals, are yet another thing one could tune to produce varying AI personalities.

Until the next cycle,
Jenny

1 comment:

  1. Great job, I think I pretty well understand what you are trying to communicate.

    ReplyDelete