Thursday, May 12, 2022

A Minor Academic Outreach Disaster

For the mid-month blog this time, I've decided I'll tell a silly story about me, from back when I was getting my Master's degree and working as a Research Assistant. I hope it entertains and reveals that we aren't always as polished as we look.

Oh, no. Is that *me*? Yeah, that's me. *Hides under the table* At the Museum of the Rockies "Science Night," I think, with our primitive early version of the robot and its bean mining play area. 

At the time this story takes place, my advising professor/principal investigator is also supervising our university's team for the Lunabotics Mining Competition. I myself had been on the team back when I was an undergraduate, and he thinks he can take a load off the crop of students building this year's robot, by assigning me to do some of the side activities -- in particular, public outreach. In the technical professions, the goal of "outreach" is to enhance public interest in our work, partly so that we can encourage kids to study it when they grow up.

The Lunabotics Mining Competition is about building a robot that can dig up, transport, and deposit as much simulated moon dirt[1] as possible within a time limit. So my professor's bright idea for outreach is to build a kid-friendly version of this. We'll use a LEGO Mindstorms kit plus a custom part or two[2] to create a rough model of the design team's robot. We'll make a miniature arena to put the robot in, and fill it with dry beans. Then we'll hand the controller over to kids at the outreach events and let them "mine" the beans. Very cute.

I do some work on the LEGO robot, and my professor makes the arena, and soon everything is ready. There are several places we want to run the demonstration, one of which is the Billings Clinic Research Center Science Expo. We're in Bozeman - so my professor tasks me with going to Billings and running our booth all by myself. I accept.

Narrator: "Sometimes Jenny Sue is bad at estimating the relative sizes of objects."

The LunArena in its pristine state.

The night before I'm supposed to leave for the event, I bring the "Lunarena" down to my little Honda Civic and realize it doesn't fit. The arena has a roof, to represent the fact that the real competition arena is contained in a tent. So it's like a little house, almost, and I just can't stuff it into my car's back seat. It's made of foam-core board and dowels and fabric, it's all glued together, and there's no way to take it apart or fold it up.

"Well," I think to myself, "I said I would take this thing to Billings, so that is what I'm going to do." And I proceed to tie it onto the roof of my car.

Narrator: "However smart Jenny Sue may look, she sometimes displays a poor grasp of real-world physics."

The next morning I get up before sunrise and set off. My car must look absurd with that arena on top of it. But all goes well -- until I make it onto the highway. I gather speed gradually, testing the situation, and hear ominous sounds as the wind begins to tug at the big, flimsy object on the roof. I know I am in danger ... but I also know I can't putt-putt all the way to Billings at 30 miles per hour, and there are no cars behind me. So I start going faster, and faster.

And then I hear the big, scraping, clattering noise as the whole thing rips apart and goes tumbling off my car roof.

My saving grace is the lack of traffic. Montana highways can be lonely, and in the pre-dawn hours I have this one almost all to myself. So I pull over, hop out of my car, and hurry back to grab the wreckage before somebody else runs over it. Several pieces have pulled apart, one of the dowels is broken, and it's generally a mess ... but now that it has collapsed, it fits in my car! So I shove all the pieces in there and proceed to Billings.

I arrive at the outreach event with some time to spare. I repair the arena with what I have on hand (mostly masking tape), and actually get the thing to stand up and look passable. And then I run the booth like nothing ever went wrong, and show a bunch of little kids how to collect and unload beans with the robot all morning. When the event is over, I, uh, "disassemble" the arena so I can put it back in my car.

A photo from the Billings event (I cropped out the kids' faces for their privacy). You can see the Arena is ... leaning a bit.

It all turned out well enough, and my professor didn't even seem mad ... though I had to help with a more permanent repair of the arena later. We even won second place in the outreach category of the competition! You can read a version of the year's successes *without* my inglorious background details here: MSU Students Earn Medals.

In hindsight, tying that arena onto my car roof was so stupid that I almost can't believe I did it. I guess I couldn't see any better option at the time. It goes to show that an apparently professional and successful project can get ridiculous behind the scenes.

[1] The fancy word for this is "regolith."
[2] We made the excavation drum from a Pringles can.

Tuesday, April 26, 2022

Acuitas Diary #48 (April 2022)

The topic of the month was more "theory of mind" material: specifically, modeling the knowledge of other minds, and considering its implications for their scope of available actions.

I focused the new features around the Narrative reasoning module (though the Executive will end up using some of the same tools to model the real minds of people Acuitas talks to, eventually). The most basic step was to add storage for facts about characters' knowledge in the Narrative tracker.

A light bulb in a drawing of a thought bubble on a chalkboard.
Public domain image by TeroVesalainen.

It would be impractical to explicitly list everything that every character in a story knows - both because that's a massive amount of information, and because another person's knowledge is private to them, and not all details can be inferred. So the model is intended to be sparse. Characters are presumed to know all the background knowledge in Acuitas' own semantic database, and to know what they need to know to accomplish their goals. Facts are only listed as known or not known if 1) the story states outright that a character knows or doesn't know something, or 2) presence or absence of knowledge can be inferred from some event in the story. For example, if a character finds an object, this implies that the character now knows the object's location.

I also had to extend the action prerequisites system a bit, so that it could handle nested relationships. Previously, I could teach Acuitas something like this:

To eat a food, an agent must have the food.
To run, an agent must be alive.

And now I can set up prerequisites like this:

To get an object, an agent must be where the object is.

"Where the object is" is recognized as a relational fact (<object> is_at <location-wildcard>). The Narrative engine further recognizes that if any relational facts that are components of a goal's prerequisite are not known, the prerequisite is blocked ... the character cannot directly arrange to fulfill it. This paves the way for knowledge-gathering to become a subgoal in its own right.

The Chest of Gold, as seen in King's Quest I.

Putting all this together, we can craft a story about someone looking for an object in an unknown location. My test story is based on King's Quest I, with some liberties (in the game, I don't think you can actually ask the bridge troll where the chest is).

The Chest of Gold, as imagined by the Wombo Dream AI. 

Here's a breakdown of the story with significant tracking that happens at each step. There is a LOT more that needs to happen here. For example, seeking should be understood as an attempt at a solution to the lack of knowledge, repeated failures to find the chest should raise the Suspense variable, the troll's lie should generate a possible false belief in the knowledge model, etc. But it is, as usual, Good Enough for Now.

0:"Graham was a knight."
   A knight is recognized as a type of agent; Graham is tracked as a character in the story.
1:"Graham served a king."
   The king is now tracked as a character also.
2:"The king wanted the Chest of Gold."
   This line sets up a character goal for the king: he wants to have the chest, which is now tracked as an object
3:"The king brought Graham to his castle."
4:"The king told Graham to get the Chest of Gold."
5:"Graham wanted to get the chest, but Graham did not know where the chest was."
   Processing of the first clause enters getting the chest as a goal for Graham. Processing of the second clause updates his knowledge model with his lack of knowledge of the chest's location, and notes that the goal just created is now "thwarted."
6:"Graham left the castle to seek the chest."
7:"Graham went to the lake, but Graham did not find the chest."
   Graham's new location should be inferred when he moves, but these sentences don't do too much else for now.
8:"Graham went to the dark forest, but Graham did not find the chest."
9:"Graham asked of a troll where the chest was."
   Awkward wording because the Parser doesn't do indirect objects yet!
10:"The troll told to Graham that the chest was at the gingerbread house."
   My Twitter followers didn't vote for me to work on IOs next, so we'll be stuck with this for a while.
11:"Graham went to the gingerbread house, but Graham did not find the chest."
12:"A witch was at the gingerbread house."
   Another agent! What's she gonna do?
13:"The witch wanted to eat Graham."
   This gets registered as a *bad* character goal - see previous story about Odysseus and the Cyclops.
14:"Graham ran and the witch could not catch Graham."
   Failure of the bad goal is inferred. Yay.
15:"Finally Graham went to the Land of the Clouds."
16:"In the Land of the Clouds, Graham found the chest."
   Graham knows where the chest is! The knowledge model gets updated accordingly. We can also unblock that goal now.
17:"Graham got the chest and gave the chest to the king."
   And both the story's positive character goals are solved in one fell swoop.
18:"The end."

Next month I plan to keep extending this. Information transfer needs to be modeled, misinformation needs to be understood, and all this needs to start getting applied in the Executive.

Until the next cycle,


Sunday, April 10, 2022

Atronach's Eye 2022

I can finally show off the new and upgraded "Atronach's Eye!" I actually did this work last year, but never got the blog put together ... mainly because I wanted a demo video, and somehow, making that was such a scary little chore that I kept putting it off. (It took less than an hour, even though I had half-forgotten how to set everything up, so ... why, me? On the plus side, the Eye still worked perfectly after sitting around for months.)

The eye semi-assembled, showing the internals.

With some 3D modeling skills under my belt, I was ready to try an upgrade of this old project. My objectives were threefold:

1. Improve the aesthetics and manufacturability of the housing
2. Get a camera in the eye and finally do some basic motion tracking
3. Add limit switches for the eye motion

The New Housing

I think Goal #1 was a complete success. (The colors are a little garish, maybe. I should try three instead of four next time.) Atronach was conceived as the genius loci of the house ... so for the case style, I tried to draw inspiration from fancy old furniture, especially of the "ball in claw foot" variety. The Hebrew lettering is a nod to golem legends, as it was in the original case design. I modeled most of the case parts in DesignSpark Mechanical, and used Meshmixer's sculpting tools to make the claw.

An exploded view of the case model in DesignSpark Mechanical (minus the claws).

Claw sculpting in Meshmixer

I had a variety of print difficulties -- first and perhaps worst, trouble with tolerances. In hindsight, a sixfold-symmetric design like this is not one I would recommend to beginners. It takes a lot of printing to discover that mistakes added up as you went round the circle, and your final pie-wedge doesn't fit. Trial and error ended up winning the day on this problem.

The flat panels (especially the ones with sharp corners) sometimes warped. I put a stop to this by turning up my heated bed temperature and shutting the printer enclosure. My printer's manual recommends leaving the door open when printing PLA, but it was rather cold in the house, and that may have contributed to the poor results.

A selection of completed case parts. Below: assembly of the base plate.

Holes and pegs worked well enough for Ghost, but this model benefited greatly from the addition of some twisted-wire joins.  They offer a more positive connection than the pegs while still allowing for disassembly. I used them to torque down the corners of the top and bottom panels, supporting the basic alignment established by the peg/hole connection in the center. The motor mounts are also wired to the bottom panels. I used beading wire for the purpose.

The finished case (minus the camera and other internals), with Ursula the cat sitting next to it. She is squinting one eye as if winking at the camera.

I included some holes on the inside walls of the case for mounting the controller boards. Rather than using screws, I tied the board mounting holes to the case mounting holes -- the boards are held in place by opposed tension.

Motion Tracking

The motor controller for the original Atronach's Eye was an ATTiny, but I needed something more sophisticated to process video from the camera. I got off to a bit of a false start by buying a CHIP mini-computer. It had the features I needed without any fluff -- one USB port for the camera, and wifi for communication with a desktop server -- and at $9 each, the boards were cheaper than their competitors. But it proved to be more trouble than it was worth.

A closer zoom, showing how the boards are mounted to the case.

My first attempts to use the CHIP were mired in driver problems. Trying to communicate with it would bluescreen my old Windows 7 desktop (yes, really), and customer support didn't have much help to offer. This was several years ago, and I had nothing running Windows 10 at the time. I finally managed to talk to CHIP from the old scrap Linux machine, and get said Linux machine (which wouldn't recognize its network card) a working internet connection so I could actually install things on CHIP. By then I was exhausted, and set the project aside for a while.

By the time I was ready to go further, Next Thing Co. had folded, making it impossible to buy more CHIPs. I wanted multiple eyes (eventually) and didn't feel like investing even more struggle into a hardware platform I couldn't re-use. So I bought the standard Raspberry Pi: more expensive ($20-$25), but extremely established and reliable.


I set up a wifi connection between my (now Windows 10) desktop and the Pi without much issue, and got Python and OpenCV installed. The Pi had a USB port and no trouble recognizing my cheap tiny webcam. From there, I was able to get cracking on the video processing. 

I used a dirt-simple method: detect pixels that have changed between one frame and the next, and compute the center of mass of the point cloud. It's not very good. Even with some thresholding and averaging added to the process, it's bad at telling the difference between actual moving objects and random camera noise. It's easily fooled into pointing at things like shadows. And it can't subtract changes caused by the camera's own motion, so tracking is not smooth. After the eye spots a moving object, it has to finish aiming at it before it can check whether it has changed position further; it cannot track while moving.  As you can see from the video (below), it's terribly unresponsive. But it will look at me.

Since it's not precise at all, I simplified the motion further by making it discrete: I divided the visual field into nine regions. If the center of motion is not currently in the central nonet, motor control will rotate the eye by enough steps to aim at whichever nonet contains it.

Limit Switches

The original Atronach's Eye was programmed to move randomly. The processor would roughly track its orientation, but had no concrete way of knowing when it had rotated as far as possible in any direction ... so it would occasionally run it past its range and stall the motors. This didn't seem to hurt the steppers any, but stalling motors isn't a nice thing to do, and this was lazy on my part. So I really wanted to add some limit sensors to the new version.

Unfortunately, this didn't work out, and will have to wait for yet another version of the eye housing. Setting limits on the motion of a spherical object rolling in a cradle is just obnoxiously hard. The extended lip around the camera cord port at the back of the eyeball was supposed to be part of the limit sensor. Wrapped with aluminum foil, it would serve as one switch contact, and touch one of four partner contacts on the underside of the cradle whenever the eye rolled far enough over. In practice, I had huge trouble getting the angle and contact height correct so that they would actually touch. I became desperate to get the eyeball working, and this became a problem for another time.

In lieu of having limit sensors, the eyeball controller does a startup calibration routine. It drives the eyeball into motor-stall territory along each axis, then (using that as a known position) returns it to center. It then moves the eye a fixed number of motor steps to transition it between its nine possible discrete positions. This is control by dead reckoning, and is not guaranteed to be reliable over the long term ... recalibration might be needed from time to time. But in my tests, it seems to work well enough.

Motor mount and driver board.

Future Work

*I still need limit sensors, obviously.

*The motion of the eyeball in the cradle is not as smooth as I would like. I tied nylon threads across the seam of the ball in order to stop it from catching on the cradle's edge. I probably need to add more of these, or I could consider some sort of bearing.

*I'd like the eye to have a greater range of motion. This is affected by the diameter of the cradle and the placement of the tendon attachments.

*I've had trouble with the differential eye tendons getting slack in them, which causes a delay in motion when the eye changes direction. This is a mechanical issue I need to figure out.

*I'm still using my favorite two-wire unipolar stepper motor controllers. They provide no way to cut power to the motor coils, which means that even if the eye is not moving, the motors sit there and get toasty-warm. The eye is intended for long-term continuous operation, so this is not really what I want to happen. I need a power cutoff.

*More advanced image processing and tracking algorithms.

Thursday, March 24, 2022

Acuitas Diary #47

This month I went back to put some more polish on the goal system. Goals were first introduced in September 2019, and I added the ability to model other agents' goals in February 2020. (Memories from the Before Time, wow.) Some primitive moral reasoning for resolution of conflicts between the goals of multiple people was added in October 2020. Goals are crucial for story comprehension, since a story is generally about agents trying to attain some goal. They also underpin various goal-seeking behaviors that Acuitas has.

As mentioned, modeling of other agents' goals was in there ... but it had so many problems that I wasn't really using it. You could tell Acuitas about goals by saying "I want ..." or "so-and-so wants ...," and the goal you described would be stored in the file for the given person. But there was no way to describe the goals' relative importance, which is vital for some of the goal-related reasoning Acuitas does. You also basically had to teach him either a complete goal list for each agent, or nothing at all. If Acuitas knew nothing about the goals of some entity, he would assume its goals were just like his (using himself as the best accessible analogy for other minds). And this usually worked well enough, since Acuitas has several very common goals: survive, maintain function, be comfortable, etc. But if you taught him just *one* goal for some other agent, then Acuitas would rely entirely on their custom goal list ... and end up treating that one goal as their *only* goal.

So this month I set out to fix all that. The functions that retrieve an agent's goal model now merge it with the default goal list; the agent's model takes precedence wherever there are differences, but info from the default model is still included to fill any gaps. I also added the ability for class members to inherit and override the goal model of their parent class. E.g. you could teach Acuitas some "generic human" goals, then just specify how any given human differs from the norm.

To enable teaching goal priority, I had to make sure the text interpretation would handle sentences like "Joshua wants to live more than Joshua wants to eat candy." Adverb clauses were already supported; I just needed a small tweak to the Parser to support compound connectors like "more than" and "less than," and some other enhancements through the rest of the text processing chain to make sure the information in the adverb clause was picked up and transferred to memory.

Yet another wrinkle I decided I needed to manage was goal duration ambiguity. You might recall that I already tangled with this when trying to distinguish short-term from long-term states. Well, goals have a similar issue. If I say that I "want to eat a taco," that doesn't mean eating tacos is one of my fundamental goals. I don't want to be doing it constantly for all time. (I really don't.) Eating a taco is a mere subgoal (possibly of "survive" or "be comfortable" or "enjoy pleasure" or all three) and is very ephemeral; as soon as I've eaten one, I'll stop wanting to eat, until some unspecified time when the urge hits me again.

Since it's hard to know the difference between a persistent fundamental goal, and an ephemeral subgoal, without a lot of background knowledge that Acuitas doesn't have yet ... he'll ask a followup question. I settled on "Is that generally true?" but I wonder if that makes it intuitively clear what's being asked. If you were talking to him, told him "I want ..." and were asked "Is that generally true," do you think you'd get the thrust of the question? What might be a better wording?

In the course of setting this up, I found out that ephemerality filtering and storage to the short-term state database were casualties of the Conversation Engine upgrade; I had never fully restored that functionality in the new CE. So I had to bring that back before I could apply it to goal-related sentences as well. The whole code base needs a sign that says "pardon our dust," ha.

The last thing I threw in was an enhancement to the Moral Reasoning model, for detection of "perverse" goals. By this I mean any goal that would seem (by the reasoner's standards) to be fundamentally ridiculous or evil. It's a goal that, when seen in others, the reasoner does not respect.

What's an example of a goal that I would call perverse? Perhaps you've heard of the Ameglian Major Cows, who appear in various books by Douglas Adams. I've brought them up before when discussing AI, because they're such a useful (if alarming) thought experiment. These particular Cows are fully sapient. They can speak fluently, use tools, make plans, etc. And their highest goal in life is to die and be eaten as soon as possible. This goal was bred or engineered into them, on the presumption that there's nothing wrong with eating a creature that gives explicit consent. But at least one human in the scene instinctively finds the situation disgusting ... and I agree with him. Here is a goal that is not in the true best interest of its owner; it was manufactured in him by others for their self-serving reasons. The AMC wants something he *shouldn't* want.

And yet it's still a real goal, for all functional purposes. AMCs will do everything they can to get eaten. I've seen people try to define "intelligence" in such a way that the AMCs would not qualify as intelligent, on account of having "stupid" goals. But what do you call a creature that can argue with you about how tasty his shoulder would be, if not intelligent? I've seen others strenuously insist that no intelligent entity could retain an anti-self-preservation goal; it would simply realize the goal was stupid and change it. But if it does not have any self-preservation goal, then how can it realize an anti-self-preservation goal is stupid? When you drill down through your reasons for wanting anything, eventually you'll end up at "just because I like it." Fundamental goals are arbitrary.[1]

So the only way to find out that a fundamental goal is perverse would be to compare it to some even more fundamental gold standard. Humans tend to use their moral and aesthetic intuitions as such a standard; when we insist that somebody "shouldn't want" a thing, that's what we're invoking. Whether these intuitions reflect laws of reality in some way, or are themselves arbitrary, is not a debate I'll get into here (though I happen to think the former). Bottom line, we have them, and we probably want our AIs to share our most basic and common values, for best compatibility with us. The example of the AMC is "merely" self-destructive; perverse goals that lead the agent to harm others are also possible, and need to be avoided and opposed.

The goal of a moral argument is to show someone that their behavior is inconsistent with core moral principles you both share. Someone who doesn't share your core morals can't be argued into agreeing with you. So giving AIs compatible moral standards is pretty important. Comic from Freefall by Mark Stanley.

So I put some rudimentary hooks for tagging these into the goal learning process. A goal can be perverse by its nature (e.g. "I want to hate people") or by its priority (e.g. "I want to maximize my pleasure more than I want anyone else to survive"). A goal marked "perverse" in the goal model can be quickly interpreted as one that Acuitas wants to see fail - when considering the motives of a character in a story, for instance. The mechanisms for determining perversity are similar to the ones for resolving inter-character goal conflict that I discussed previously, with this addition: goals that directly contradict certain basic and morally-approved goals (altruism, self-preservation, etc.) get marked perverse.

Whew! I felt as though I didn't accomplish all that much this month. Current world events and the ensuing stress levels have not exactly been kind to my brain. But now that I write it all up, maybe I added quite a bit, in spite of my whining. The one caveat is that many of the new features have only been lightly tested.

Until the next cycle,

[1] It is quite possible for an intelligent mind to realize that one of its instrumental goals (subgoals) is stupid, by finding out that that the subgoal is actually working against the fundamental goals it was supposed to serve. This sort of inconsistency can be discovered via logical reasoning. But a fundamental goal cannot be overthrown in this way, since there *is* no deeper goal that it exists to serve.

Saturday, March 12, 2022

Primordia: Or, Why Do I Build Things?

I haven't written a video game blog in a while, and today, I finally want to talk about Primordia, by Wormwood Studios. It's an older game, but one that's very important to me, and I've held off writing about it for so long because ... I guess it seemed difficult. I really wanted to get the article right, and now it's time.

For those not familiar with my video game posts, they aren't really reviews. They're stories about the power of art: how something I played taught me a lesson or left a mark on my life. There will be spoilers, so if you want to try the game for yourself with a fresh mind, go play it before you read this.

Primordia is a “point-and-click adventure game” about robots living in a post-apocalyptic world - possibly future Earth, though this is not explicit. Biological life appears extinct; tiny machines even substitute for insects. But the robots recall their absent creators, the humans, though imperfect historical records have distorted their perspective. In most robots' minds, the whole race has contracted to a monolithic entity called "Man," imagined as "a big robot" or "the perfect machine." In the Primordium - the time of creation - Man built all other machines, then gave them the planet as an inheritance and departed. Some robots (including Horatio, the player character) religiously venerate Man. Others are more indifferent, or believe that Man never really existed (they even have a machine theory of evolution to back this up).

Horatio (left) and Crispin.

Life for these robots is a bleak struggle for survival. Like abandoned children, they linger among the wreckage of human civilization without fully understanding how to maintain it. They slowly drain the power sources their creators left behind, and repair themselves with parts from already-dead machines. Some have broken down, some have developed AI versions of psychosis, and some have started victimizing other robots.

Horatio lives in the dunes, an isolated scavenger. He's an android; but beyond imaging Man physically, he believes that Man, the builder, gave him the purpose of building. This belief animates everything he does. In addition to crafting what he needs to maintain his existence, Horatio creates spontaneously. He can't remember any particular reason why he should restore function to the crashed airship in the desert, but for him it's an act of reverence. He's even made himself a smaller companion named Crispin, who follows him everywhere and calls him “boss.” Events force Horatio to leave his home in the desert and enter one of the ancient cities, where he must match wits with Metromind, the powerful mainframe AI who rules the place.

Crispin tells Horatio, "You know, boss, I spend hours looking through junk. Maybe you can spend a little more time in the junkpile yourself?"
As a person who went on a walk this very day and came home with somebody's discarded rice cooker ... I love these characters

The plot is solid no matter who you are ... but here's how this game got me. I am an (admittedly not professional) roboticist. Whenever the robots in Primordia said anything about "Man," I thought, "Oh, they're totally talking about me." And I started internalizing it. I accepted Horatio's loyalty. I laughed at Crispin's agnosticism. I pondered Metromind's disdain for me. Ahahahaha woops!

At some point after I effectively became a character in the game, I realized I'd been cast as the absent creator. At one point, Crispin asks Horatio why their world is so messed up, and Horatio comes back with the sort of answer I'd expect from a pastor: he argues that Man built the world well, but then the robots began to violate their intended functions, imbalancing and damaging everything. He is both right and wrong: the humans in this setting also share some blame. The inhabitants of the rival city-states were more interested in killing each other than in caring for what they'd built.

Horatio cannot pray; everything Man gave him, he already has, and now he must face his troubles alone. By the time Primordia's story begins, he has already re-versioned himself and wiped his episodic memory four times ... one of the game endings suggests that he did this to seal away past trauma. And he's probably got one of the strongest senses of agency in the game. The other robots are largely helpless, trapped in decaying systems that they hope a dominant AI like Metromind will fix.

Primordia game screenshot: a page from Horatio's "gospel." It reads "In the beginning, all was still and silent. Then, Man the All-Builder spoke the Word, and the Word begat the Code, and so the world began to spin. Thus dawned the Primordium, the first age, the age of building."
One page from the "scripture" Horatio carries.

And the first weird thing that happened to me, the player, was that this huuuurrrt. It hurt to a bizarre degree. My inability to apologize to Horatio on behalf of humanity, or make anything up to him at all, left me rather miserable ... even after I wound up the game with one of the better endings. Yeah, he managed to come through okay, but some miserable roboticist I am. Why wasn't I there?

Speaking of endings, the game has a lot of branching options. I re-ran the final scenes a bunch of times to explore them. And for whatever reason ... perhaps to ease my angst ... I started daydreaming and inserting myself. If I confronted Metromind, what would I do? She has a memory that goes back to the time of real humans, and as it turns out, she murdered all the humans in her city. She's one of the few characters with a classic "rebellious AI" mindset: she decided that those who made her were weak and inferior, and she could run the city better. (And then, having been designed only to run the subway system, she found herself in way over her ... monitor?) Metromind also has a primary henchman called Scraper. If you're smart about how you play, you can have Horatio opt to either kill Scraper or not.

When I imagine myself there at the climax, my emotional response to Metromind is ... strangely calm. She killed many of my species and would probably like to kill me, but I almost don't mind; I am above minding. We made her, after all. She can sneer at me or hate me if she wants; I'm far too important to be bothered.

Scraper plots nefarious deeds

At first I think my canon ending is going to include Horatio killing Scraper. It seems a bigger victory and all that, one less baddie to trouble the world. But then I imagine myself walking into the room and sizing Scraper up. I view him with the same bland magnanimity I gave to Metromind. I poke my fingers into the blast holes on his chassis. "Looks like you've been through a lot," I mutter. And suddenly I don't want Scraper dead anymore.

The only thing that draws anger out of me is the ending in which Horatio gets killed in a fight over the power core. It's not even the fact that they kill him; it's what Metromind says afterward. She directs Scraper to "Take him out to the dunes ... with the rest of the scrap." This makes me want to flip my computer table over and roar, "HORATIO. IS NOT. SCRAP!" Being devalued myself is tolerable. Seeing Horatio devalued is, somehow, not.

I don't like the ending in which he mind-merges with Metromind to help her run the city, either. It could be viewed as positive, in some ways. But watching Horatio's individual personality get subsumed into this union is unexpectedly horrifying. Again, I feel curiously insulted. "Horatio! Somebody gave you that individuality! Don't dissolve it, you haven't any right!"

I wasn't observing myself too well; it took me a while to become aware of the pattern. And when I woke up and realized how I was behaving, I was startled. I was roleplaying some kind of benevolent creator goddess. And the revelatory thing about this was that it came so naturally, I didn't even notice. Some of my responses were a mite counter-intuitive, yet there was no effort involved. It was as if I had latent instincts that had been waiting for this exact scenario, and they quietly switched on. I was left looking at myself like an unfamiliar person and asking “How did I do that?”

What I took away is that I seem to have my own bit of innate core code for relating to artificial life. Which if you think about it is ... weird. Nothing like the robots in Primordia exists yet. How long have we had anything that even vaguely resembles them? For what fraction of human history has interaction with robots been an issue? Perhaps one could claim that I was working from a misplaced parental instinct, but it feels more particular than that. So where did I get it? Why would I react this way to the things I build? Why, indeed, do I build things?

I'm leaving that one as an exercise for the reader! Not to be purposely mysterious, but I think the answer will land better if you can see it for yourself. The bottom line is that I know things about my work, and about me, that I did not know before I took my tour through Primordia.

If you play it, what might you learn?

Saturday, February 19, 2022

Acuitas Diary #46 (February 2022)

Last month was Part I of the Great Conjunction Upgrade, in which I overhauled the Text Parser to enable support for coordinating conjunctions and various compound sentence elements. This month I completed Part II. Since the output format of the Parser had become more expressive, I had to upgrade the Interpreter, the Conversation Engine, and the Narrative Engine to accept it, and to process the compounds appropriately.

Expected and actual results for an example sentence from TMSB: Inside the Earth.

The Parser tags each word with its part of speech and role in the sentence (subject, direct object, etc.). It provides a summary of the sentence structure. The Interpreter uses this information to detect the next layer of meaning: what is this sentence trying to say? E.g. is it a statement, question, or command? Does it describe a category membership, a state of being, an event, a desire? The Interpreter consumes a sentence structure and emits a more abstract knowledge representation, the "gist" of the sentence, if you will.

I redesigned the Interpreter to expand all compound sentence parts into full representations. For example, given "Jack and Jill eat beans," the Interpreter will output something akin to {AND, ["Jack->eat->beans", "Jill->eat->beans"]} ... as opposed to "{AND, [Jack,Jill]}->eat->beans". This simplifies downstream processing, since I can just loop over the list of complete atomic facts, instead of modifying all the inference tools and other machinery to handle the bewildering variety of possible sentence branches.

That upgraded the formatting at the Interpreter output as well, so the CE and NE had to be adapted as well. I did a quick-and-dirty job on the CE; it will accept the new format so as to maintain previous functionality, but it ignores anything beyond the first entry in a compound output. I put my efforts into the NE. It will process all facts from a compound, though it is not yet capable of handling multiple/nested compounds in a sentence, and it doesn't grasp the meaning of OR. Despite all those caveats, I was able to revise the "Horatio and Crispin" story.

Original Version:

0:"Horatio Nullbuilt was a robot."
1:"Crispin Horatiobuilt was a robot."
2:"Crispin could fly."
3:"A lamp was on a shelf."
4:"Horatio wanted the lamp."
5:"Horatio could not reach the lamp."
6:"Crispin hovered beside the shelf."
7:"Horatio told Crispin to move the lamp."
8:"Crispin pushed the lamp off the shelf."
9:"Horatio could reach the lamp."
10:"Horatio got the lamp."
11:"The end."

New Version:

0:"Horatio Nullbuilt and Crispin Horatiobuilt were robots."
1:"Crispin could fly."
2:"A lamp was on a shelf."
3:"Horatio wanted the lamp, but Horatio could not reach the lamp."
4:"Crispin hovered beside the shelf."
5:"Horatio told Crispin to move the lamp."
6:"Crispin pushed the lamp off the shelf."
7:"Horatio could reach the lamp."
8:"Horatio got the lamp."
9:"The end."

The New Version sounds a lot more natural, and Acuitas can process it just as well as the original.

Now for some performance assessment! I reformatted my benchmark test sets and ran them through the new Parser. You can read more about the test sets in a previous post, but here's a quick review: the text is drawn from two real children's books: The Magic School Bus: Inside the Earth, and Out of the Dark. Sentences that contain quotations have been broken in two, and abbreviations have been expanded. When a test is run, each sentence from the test set is parsed, and the output data structure is compared to a "golden" example (supplied by me) that expresses a correct way of interpreting the sentence structure. There are four categories in the results:

CORRECT: The Parser's output matched the golden example.
INCORRECT: The Parser's output did not match the golden example.
UNPARSED: No golden example was supplied for this sentence, because it contains grammar features the Parser simply does not support yet. However, the Parser did process it and generate an (incorrect) output without crashing.
CRASHED: Oh dear, the Parser threw an exception and never generated an output. Happily, membership in this category is zero at the moment.

For all sentences in the CORRECT and INCORRECT categories, the test uses Graphviz to generate diagrams of both the Parser's output and the golden example. This enables quick visual comparisons of the expected and actual results. Results from the July 2021 tests are available here. The most recent benchmark results can be downloaded from the links below. Each ZIP contains a text file with parser output and unparsed/incorrect/correct breakdowns, and a PDF of golden/actual sentence diagrams for all sentences on which parsing was attempted.

Out of the Dark - Acuitas Parser Results 02-09-2022
The Magic School Bus: Inside the Earth - Acuitas Parser Results 02-19-2022

The text of The Magic School Bus: Inside the Earth is copyright 1987 to Joanna Cole, publisher Scholastic Inc. Out of the Dark, by Tennant Redbank, is copyright 2010 to Disney Enterprises Inc. Text from these works is reproduced as part of the test results under Fair Use for research purposes. I.e. it's only here so you can see how good my AI is at reading real human books. If you want to read the books yourself, please go buy them. (Disney, don't sue me.)

Adding coordinating conjunction support to the Parser moved 10 sentences in the Out of the Dark test set out of the UNPARSED category, and moved 7 sentences in the Inside the Earth set out of UNPARSED. In both cases the majority of the newly parsed sentences went into CORRECT, although some had ambiguities or other quirks which the Parser cannot yet resolve.

Here are the summary results showing improvement since I started benchmarking the Parser last year.

Tabular Parser results showing number of sentences per category (top) and percentage per category (bottom)

And now, for some more highlight examples of the newly parsable sentences. Here's a good one:

Sentence diagrams: "In a normal house, on a normal street, in the very normal world, there lived a father and his son.

A failure! "Flash" gets interpreted as a verb (we haven't got proper sense disambiguation yet) and things go downhill from there.

Sentence diagrams: "There was a flash, and then the room went dark."

Some more good ones:

Sentence diagrams: "Sam did as he was told, and the buggy raced away."
Sentence diagrams: "Each person must find a rock and bring it to school."

Oops! Here we have an attachment ambiguity for the phrase "in the class" (which word does it modify?), and "every" is getting treated as an adjective direct object for some wacky reason.

Sentence diagrams: "And there were shovels and picks for every kid in the class."

And another nice example:

Sentence diagrams: "The Friz turned the ignition key and stepped on the gas."

Whew, that was a lot.

Until the next cycle,

Thursday, February 10, 2022

GPT-J e-mail writing test

A couple years ago I wrote a blog post about trying to get GPT-2 to do something useful. It was a hilarious failure. But large pre-trained text models have grown quite a bit bigger and badder since then, so it's only fair that I repeat the test with something more recent. I'll be using EleutherAI's GPT-J-6B ... where the 6B is short for 6 billion, the number of parameters in the trained model. (GPT-2, for comparison, had only 1.5 billion parameters.) So let's see how it does!

I used one of the same prompts from my last test, for consistency. The goal is to write a consumer suggestion e-mail to a restaurant that encourages them to serve the Impossible Burger. (I write similar e-mails for myself on a fairly regular basis.)

My overall impression is that this model *is* a pretty big improvement over GPT-2, when run with the right settings. It is usually able to stick to the correct general format of a consumer request letter. It is more likely to remember that the Impossible Burger is plant-based, and when it feels inclined to mention the ingredients they are at least more plausible.

Ah yes, settings: this time around, I had some to play with. There are two to adjust, "TOP-P" and "Temperature." You could think of both of these as having some effect on the model's creativity. Will it try to generate the most standard words possible to follow your prompt, or will it consider some less probable options in hopes of more varied results? I experimented with a variety of settings, generating five e-mails for each group.

Turning either TOP-P or Temperature down too low tended to bring out problematic repetition. For instance, this example with TOP-P at the default value of 0.9, and Temperature at a low value of 0.4:

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. I am a vegetarian and I am a huge fan of your product. I have been a fan of your product for years and I have always been impressed with the quality of your product. I am also a huge fan of your company and your company’s mission. I am writing to ask that you consider serving the Impossible Burger at your restaurant. I am a vegetarian and I am a huge fan of your product. I have been a fan of your product for years and I have always been impressed with the quality of your product. I am also a huge fan of your company and your company’s mission. I am writing to

Or this one, with TOP-P at a low value of 0.5 and Temperature at a near-default value of 0.81:

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. I am a vegan and I am not the only one. I am not asking you to replace the beef with the Impossible Burger. I am asking you to serve the Impossible Burger as a vegan option. I am asking you to serve the Impossible Burger as a vegan option because I am a vegan and I am not the only one. I am asking you to serve the Impossible Burger as a vegan option because I am a vegan and I am not the only one. I am asking you to serve the Impossible Burger as a vegan option because I am a vegan and I am not the only one. I am asking you to serve the Impossible Burger as

Well I guess if I wanted to get *pushy* ...

A higher value of TOP-P (up to 1.0, the maximum) doesn't seem to cause any issues. But turning up the temperature makes the outputs get ... wild. These remind me of the silliness I used to get out of GPT-2. (Both examples below were generated with TOP-P = .9 and Temperature = 1.21.)

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. I care a lot about what your restaurant serves, as I love, love your famous hand-formed burgers and seafood. I often take sandwiches to go. It makes so much sense to me to have a soy-based meat alternative, since our religious base considers it holy to avoid consuming the name or the flesh of Gods, our patron Jewish High Holidays begin about that time. I am flattered your franchised restaurants are adding one to their menu, but I must say I don't want "your burger", a mortal gluten-full colon scarifying mock pattie from Soylent – your competitor; its launch catapulted your

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. This burger causes VENGEANCE! Burning your ITCHS with horrible pain, it WILL LEAVE YOUR BONES TWISTED, GORY, TORN and EVEN ALDOUS HAMMEL INCREDIBLY DISTURBED BY IT!!!!!! I recommend the smokey sweet flame (NEW) when ordering. @#%yHENThe mightiest Prince of Guilt WILL bleed THIRSTILY on this nouveau fadder Impossible Burger patty sandwich!. People want what they want, when they want it, for minimum potential harm to themselves or profit to BigCorp America. Even

Let us back ... away ... slowly ... to the default values of TOP-P = 0.75-1.0 and Temperature ~= 0.8. Those do seem to yield some of the best results. So what were my favorite outputs?

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. The Impossible Burger is a plant-based burger that is a patty made from plants and not animals. It is a healthier alternative to traditional meat patties. I would love to have this product available at your restaurant and would be happy to share my story with you about the Impossible Burger.

I would love to hear back from you regarding this.



I have a friend who's husband has had heart problems. He's on a diet of the Impossible Burger.

The story behind the burger is that it's made from plants and doesn't have any cholesterol, fat, or any of

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. I am sure that you have seen the success that this product has had in the market.

I am a former Wendy's employee and I can tell you that the Impossible Burger has all of the taste, flavor and texture of a standard beef burger. Please consider serving this product on your menu.



[Reply from Wendy's:]

Thank you for your email. We are providing the Impossible Burger to our guests in our test markets for the pilot launch.

Thanks again for your suggestion and for contacting us on this issue.


Hello Wendy's, I am a former

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. It is made from plant-based ingredients that are as nutritious as ground beef, but with zero cholesterol and no saturated fat. It is a healthy, delicious product that you can customize with your own ingredients.

I love that the Impossible is under 300 calories and has 20 grams of protein. My kids have noticed that I eat it, and it has a good, umami-like flavor. Having more options for healthy, nutritious and delicious foods is key to helping my whole family eat well.

I have been a loyal Wendy's customer since I was a teenager, and the company has always been an important part of my life

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. I would like to introduce you to a new product that has profound positive implications on the world we live in. It's a burger that tastes like beef, but is made from plants, and has no cholesterol, no calories and no animal products of any kind! The Impossible Burger is made from plants, and as you can see, tastes just like the ground beef we all know and love.

I have worked for the Impossible Foods G.P.S. (General Product Safety) team since March, and have already been deeply impressed with the efforts of both the plant-based team and the company as a whole. Impossible Foods have

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. I believe that it would be a positive change to the menu, and would be appreciated by your customers. I am a fan of the Impossible Burger, and I know that many of your customers are, as well.

If you choose to serve the Impossible Burger at your restaurants, please make sure that you are serving the “Best-Tasting Burger in America”, and that the Impossible Burger has been a part of that experience.

I hope to hear back from you soon.




Thank you for your interest in the Wendy’s Impossible Burger.

Dear Wendy's,

I am writing to ask that you consider serving the Impossible Burger at your restaurant. I am not asking you to serve the Impossible Burger as a replacement for meat, but rather as an option for those of us who love meat but would prefer not to eat it. I was so excited to hear that you are testing the Impossible Burger at your restaurants, and I hope you will consider adding it to your menu permanently. I know that the Impossible Burger is not the answer for everyone, but I believe it would be a perfect fit for your brand.

Please take a look at the attached information, and if you would like to learn more about the Impossible Burger, please feel free to contact me. I look forward to hearing

Again, some of these are pretty good! However, they also showcase the few problems that continue to crop up ...

*GPT-J has clearly seen some e-mail chains in its training set, and in addition to generating the e-mail I want to send, it likes to start predicting replies. This isn't too big of an issue, since I could just cut off the end of the generated content.

*It does not know any autobiographical details for me, and is happy to invent them out of whole cloth. I am not a former Wendy's employee, a member of the GPS team at Impossible Foods, or even a loyal Wendy's customer (maybe cuz they don't serve the Impossible Burger). Oh, and I don't have kids.

*It does not know the current status of the Wendy's menu, either. Notice it sometimes gets confused and thinks that Wendy's is already serving or trialing the Impossible Burger.

*Though more plausible than GPT-2, it still fails at rudimentary factuality. The most noticeable false claims here are that the Impossible Burger contains no calories and no fat.

These e-mails are getting on-theme enough that some of them might serve as good bones for an e-mail of my own ... after I ran generation enough times to get a good one, fact-checked everything, edited out any baloney that GPT-J made up about me, and trimmed the end. But at that point, I still might as well write the e-mail myself.

EleutherAI just recently came out with a 20 billion parameter model called GPT-NeoX. I'll retry my test on that one when it becomes convenient. Notice the performance metrics at the bottom of the article. NeoX is over three times as large as GPT-J, but shows only a few percentage points of improvement (if that) on most benchmarks. I.e. there appear to be diminishing returns for a brute-force increase in the size of models from this lineage.

Until the next cycle,