A couple months ago I described my plans to implement trial-and-error learning so Acuitas can play a hidden information game. This month I've taken the first steps. I'm moving slowly, because I've also had a lot of code cleanup and fixing of old bugs to do - but I at least got the process of "rule formation" sketched out.
Before any rules can be learned, Acuitas needs a way of collecting data. If you read the intro article, you might recall that he begins the game by selecting an affordance (obvious possible action) and an object (something the action can be done upon) at random. In the particular game I'm working on, all affordances are of the form "Put [one zoombini out of 16 available] on the [left, right] bridge," i.e. there are 32 possible moves. Once Acuitas has randomly tried one of these, he gets some feedback: the game program will tell him whether the selected zoombini makes it across the selected bridge, or not. Then what?
After Acuitas has results from even one attempted action, he stops choosing moves entirely at random. Instead, he'll try to inform his next move with the results of the previous move. Here is the basic principle used: if the previous move succeeded, either repeat the move* or do something similar; if the previous move failed, ensure the next move is different. Success and failure are defined by how the Narrative scratchboard updates goal progress when the feedback from the game is fed into it; actions whose results advance at least one issue are successes, while actions that hinder goals or have no effect on goals at all are failures. Similarity and difference are measured across all the parameters that define a move, including the action being taken, the action's object, and the features of that object (if any).
*Successful moves cannot be repeated in the Allergic Cliffs game. Once a zoombini crosses the chasm, they cannot be picked up anymore and must remain on the destination side. But one can imagine other scenarios in which repeating a good choice makes sense.
Following this behavior pattern, Acuitas should at least be able to avoid putting the same zoombini on a bridge they already failed to cross. But it's probably not enough to deliver a win, by itself. For that, he'll need to start creating and testing cause-and-effect pairs. These are propositions, or what I've been calling "rules." Acuitas compares each new successful action to all his previous successes and determines what they share in common. Any common feature or combination of features is used to construct a candidate rule: "If I do <action> with <features>, I will succeed." Commonalities between failures can also be used to construct candidate rules.
The current collection of rule candidates is updated each time Acuitas tries a new move. If the results of the move violate any of the candidate rules, those rules are discarded. (I'm not contemplating probability-based approaches that consider the preponderance of evidence yet. Rules are binary true/false, and any example that violates a rule is sufficient to declare it false.)
Unfortunately, though I did code all of that up this month, I didn't get the chance to fully test it yet. So there's still a lot of work to do. Once I confirm that rule formation is working, future steps would include the ability to design experiments that test rules, and the ability to preferentially follow rules known with high confidence.
Until the next cycle,
Jenny