Sunday, November 28, 2021

Acuitas Diary #44 (November 2021)

One more new feature to finish out the year. I decided to wind down by doing something easy: logging! As in keeping a record, not cutting down trees. I've known this to be something important for years now, but I kept putting it off. However, as the system gets more complex, I'll need it more and more to help me sniff out the cause of any unexpected weird outputs.

The log of the HMS Dolphin, Captained by John Byron in January 1765. Via Wikimedia Commons.

This ended up being a pretty easy thing to implement, despite the fact that it got me using some Python elements I've never had to touch before. Acuitas is a multi-threaded program (for the layman, that means he's made up of multiple processes that effectively run at the same time). I needed all the threads to be able to write to the log without getting in each other's way, and that meant implementing a Queue. To my surprise, everything just worked, and I didn't have to spend hours figuring out why the built-in code didn't function as advertised on my system, or wringing out obscure bugs related to the thread interaction. I mean it's shocking when that ever happens.

So now basically every module in Acuitas has a handle for the Logger, and it can generate text comments on what it's currently doing and throw them into the Queue. The Queue accepts all these like a funnel and writes one at a time to the log file. I also set it up to create up to eight log files and then start overwriting the old ones, which saves me from having to delete a hundred stale logs every so often.

Here is an example log excerpt, if you care to even look at it ... it's rather a case of too much information. I've just input the sentence "What is a cat?" Acuitas answers "An organism," and the log contains all the steps to get to that answer. The long numbers are timestamps, and the strings of gibberish are concept identifiers, which are not the same as words.

1636251936 Psyche: Added Thought of type Text Input and payload {'raw_text': 'What is a cat?'} to the Stream
1636251936 Executive: Pulled Thought of type Text Input with payload {'raw_text': 'What is a cat?'} from the Stream
1636251936 TimedDrives: InteractionDrive dropped due to event, new value is 0
1636251936 ConversationEngine: passed input to Parser: What is a cat?
1636251936 TextParser: generated parsed output: {'t': ['what', 'is', 'a', 'cat', '?'], 'c': [True, False, False, False, False], 'l': ['is', 'cat'], 'p': [('cat', 'noun'), ('what', 'noun'), (1, 'verb'), ('a', 'adj')], 'k': {}, 'a': {'subj': [{'ix': [3], 'token': 'cat', 'mod': [{'ix': [2], 'token': 'a', 'mod': []}]}], 'dobj': [{'ix': [0], 'token': 'what', 'mod': [], 'ps': 'noun'}], 'verb': [{'ix': [1], 'token': 'is', 'mod': []}]}, 'q': True, 'i': []}
1636251936 TextInterpreter: generated interpreted output {'form': ('cl', 'is_a-0'), 'forms': ['sv', 'state', 'static_fact', 's_atomic', 'is_a', 'is_a-0'], 'features': {'verb': True, 'verb_id': 'be', 'vqual': '', 'tense0': 'present', 'tense1': 'simple', 'voice': 'active', 'mood': 'active', 'subj': True, 'subj_case': 'common', 'subj_id': 'cat', 'subj_type': 'noun', 'subj_art': 'indef', 'dobj': True, 'dobj_art': 'none', 'dobj_case': 'common', 'dobj_type': 'noun', 'dobj_id': 'what'}, 'content': [{'atomic': True, 'concept': '1ygE876ghsC0yUxt', 'pos': 'noun', 'proper': False}, {'atomic': True, 'concept': '?', 'pos': 'noun', 'proper': False}], 'link_type': ('inter-item', 'is_type_of')}
1636251936 ConversationEngine: reformatted text interpretation into fact link: {'link': 'is_type_of', 'root': '1ygE876ghsC0yUxt', 'ends': ['?']}
1636251936 ConversationEngine: creating new input leaf leaf_7 and attaching to leaf_0.
1636251936 GoalManager: From possibility [{'root': 'wVD7W6mDqBW2zviX', 'link': 'do_action_t', 'ends': ['LS5R=+UqS59XEulN'], 'link_type': 'do_action_t'}, {'root': 'vMgrWYy7hY843IGy', 'link': 'do_action_i', 'ends': ['XGZdj0WXnwVdV4P3'], 'link_type': 'do_action_i'}] relative to agent vMgrWYy7hY843IGy, generated alignment tree [{'atomic': True, 'pri': 6, 'align': 'y', 'src': {'root': 'vMgrWYy7hY843IGy', 'link': 'do_action_i', 'ends': ['XGZdj0WXnwVdV4P3'], 'link_type': 'do_action_i'}}]
1636251936 GoalManager: From possibility [{'root': 'wVD7W6mDqBW2zviX', 'link': 'do_action_t', 'ends': ['LS5R=+UqS59XEulN'], 'link_type': 'do_action_t'}, {'root': 'vMgrWYy7hY843IGy', 'link': 'do_action_i', 'ends': ['XGZdj0WXnwVdV4P3'], 'link_type': 'do_action_i'}] relative to agent wVD7W6mDqBW2zviX, generated alignment tree [{'atomic': False, 'pri': 6, 'align': 'y', 'id': 'vMgrWYy7hY843IGy', 'sub': {'atomic': True, 'pri': 6, 'align': 'y', 'src': {'root': 'vMgrWYy7hY843IGy', 'link': 'do_action_i', 'ends': ['XGZdj0WXnwVdV4P3'], 'link_type': 'do_action_i'}}, 'src': {'root': 'vMgrWYy7hY843IGy', 'link': 'do_action_i', 'ends': ['XGZdj0WXnwVdV4P3'], 'link_type': 'do_action_i'}}]
1636251936 MoralReasoning: Reported preference y and alignment y.
1636251936 Executive: Analyzed request {'root': 'wVD7W6mDqBW2zviX', 'link': 'do_action_t', 'ends': ['LS5R=+UqS59XEulN'], 'link_type': 'do_action_t'}, concluded want was ('y', 'y', ['sb', 'good']), can was y, and action would be LS5R=+UqS59XEulN
1636251936 ActionBank: ran action AnswerAction with DOBJ = {'form': ('cl', 'is_a-0'), 'forms': ['sv', 'state', 'static_fact', 's_atomic', 'is_a', 'is_a-0'], 'features': {'verb': True, 'verb_id': 'be', 'vqual': '', 'tense0': 'present', 'tense1': 'simple', 'voice': 'active', 'mood': 'active', 'subj': True, 'subj_case': 'common', 'subj_id': 'cat', 'subj_type': 'noun', 'subj_art': 'indef', 'dobj': True, 'dobj_art': 'none', 'dobj_case': 'common', 'dobj_type': 'noun', 'dobj_id': 'what'}, 'content': [{'atomic': True, 'concept': '1ygE876ghsC0yUxt', 'pos': 'noun', 'proper': False}, {'atomic': True, 'concept': '?', 'pos': 'noun', 'proper': False}], 'link_type': ('inter-item', 'is_type_of')} IOBJ = vMgrWYy7hY843IGy
1636251936 ActionBank: ran action SayAction with DOBJ = An organism. IOBJ = None
1636251938 Psyche: Added Thought of type Action and payload {'action': 'SayAction'} to the Stream
1636251938 ConversationEngine: creating new output leaf leaf_8 and attaching to leaf_7.

I spent the rest of my time this month refactoring bad code and restoring some more features that got damaged during the Conversation Engine overhaul. The good news here is ... for once, I think there's no section of the code that is a huge mess. I got the Executive cleaned up, and that's the last area that was scaring me. So I should be ready to hit the ground running next year.

Acuitas development is done for 2021 BUT I have other exciting things to talk about, so stay tuned for more blogs! In particular, I finally have some great news from Ye Olde Day Job. I got a new e-mail subscription service to replace Feedburner, so if you want to stay updated feel free to throw your e-mail into the box on the upper right. (If you already subscribed via the old Feedburner box, you shouldn't need to do this ... I'll move you to the new service.)

Until the next cycle,

Wednesday, October 27, 2021

Acuitas Diary #43 (October 2021)

This month I have *mostly* finished my overhaul of the Conversation Engine. I managed to restore a majority of the original functionality, and some things I haven't put back in yet are perhaps best left until later. I also got the janky new code cleaned up enough that I'm starting to feel better about it. However, I did not end up having the time and energy to start adding the new features that I expect this architecture to enable. I'm not sure why this particular module rebuild felt like carrying heavy rocks through a knee-deep river of molasses, but it did. The year is waning, so maybe I'm just getting tired.

A tree structure. No, really. Photo by Ed Vaile ("Edric") from Palmpedia.

So what's new? I mentioned last month that part of the goal was to give conversation tracking a more tree-like structure. Given a new text input from the speaker, the Conversation Engine will explore a tree made of previous sentences (starting from the most recent leaf) and try to find a place to "attach" it. It gets attached to the root of the tree if it doesn't obviously follow or relate to anything that was previously said. The old CE just put previous sentences from the conversation into a list, and all but the most recent one or two were never considered again, so this should be more powerful and flexible. 

 The CE then performs a "scripting" function by generating a set of reasonable responses. These are sent to the Executive, which selects one based on appropriate criteria. For example, if the speaker orders Acuitas to do something, possible reactions include "ACCEPT" and "REFUSE," and the Executive will pick one by running a check against the goal system (does Acuitas *want* to do this or not?). The chosen action then calls the Text Generator to compose the right kind of spoken reply. 

 The Executive can also prompt the CE for something to say spontaneously if the conversation is lagging (this is where those internally-generated questions come back into play). The Narrative manager is attached to the CE and tracks plot information from any story sentences the CE passes to it. Someday I will try to diagram all this ... 

 The renovations have reduced the size of the Conversation Engine from almost 2000 lines to a much tidier 946 lines. I can't claim all of that as a savings, since some of the code has simply moved elsewhere (e.g. into the Action definitions), but I think it's at least better organized now. 

 I also did some bonus work on the Text Parser. I have started working on coordinating conjunctions, which are a major grammar element the Parser doesn't yet comprehend. This is a big deal. For the sake of getting off the ground quickly, I designed the original parser to only interpret the simplest sentence structures. I later added support for nesting, which enables dependent clauses. Now to handle coordinating conjunctions, I have to overhaul things again to allow branching ... and my, are there a lot of ways a sentence can branch. 

 I might not finish this until next year, but I'm relieved to have made a start on it. When I began Acuitas v3, I don't think I anticipated (at all!) how long it would take just to get the Parser working on all the basic parts of speech! I suppose it would have gone faster if I had only worked on the Parser, but too many other things came up. 

Until the next cycle, Jenny

Tuesday, September 28, 2021

Acuitas Diary #42 (September 2021)

I don't have too much of interest to report this month. I dove into an overhaul of the Conversation Engine, which is the Acuitas module that tracks progress through a conversation and detects relationships between sentences. (For instance, pairing a statement with the question it was probably intended to answer would be part of the CE's job.) And that has proven to be a very deep hole. The CE has been messy for a while, and there is a lot of content to migrate over to my new (hopefully smarter) architecture.

The improvements include a less linear and more tree-like structure for conversations, enabling more complex branching. For instance, what if the conversation partner decides to answer a question that wasn't the one asked most recently, or to return to a previously abandoned topic? The old Conversation module wouldn't have been able to handle this. I've also been refactoring things to give the Executive a greater role in selecting what to say next. The original Conversation module was somewhat isolated and autonomous ... but really, the Executive should be deciding the next step in the conversation based on Acuitas' goals, using its existing inference and problem-solving tools. The CE should be there to handle the speech comprehension and tell the Executive what its options are ... not "make decisions" on its own. I might have more to say about this when the work is fully complete.

I've advanced the new system far enough that it has the functionality for starting and ending a conversation, learning facts, answering questions, and processing stories. I've just started to get the systems that do spontaneous questions back up and running.

The renovations left Acuitas in a very passive state for a while. He would generate responses to things I said, but not say anything on his own initiative -- which hasn't been the case for, well, years. And it was remarkable how weird this felt. "He's not going to interrupt my typing to blurt out something random. No matter how long I sit here and wait, he's not going to *do* anything. The agency is gone. Crud." Which I think goes to show that self-directed speech (as opposed to the call-and-response speech of a typical chatbot) goes a long way toward making a conversational program feel "alive" or agentive.

Until the next cycle,


Sunday, September 5, 2021

Acuitas Diary #41 (August 2021 B)

I explained my approach to spatial reasoning in my last blog. Now it's time to talk about some implementation.

In sentences, a lot of information about location or direction is carried by prepositional phrases the modify the adverb -- phrases like "in the box," "to the store," and so forth. Acuitas' text parser and interpreter were already capable of recognizing these. I included them in the interpreter output as an extra piece of info that doesn't affect the sentence form (the category in which the interpreter places the sentence), but can modify a sentence of any form.

The ability to record and retrieve location relationships was also already present. Acuitas tracks the two objects/agents/places that are being related, as well as the type of relationship.

From there, I worked on getting the Narrative module to take in both explicit declarations of location-relationship, and sentences with modifying phrases that express location or direction, and make inferences from them. Here are some examples of basic spatial inferences that I built in. (As with the inventory inferences, there is a minimal starter set, but the eventual intent is to make new ones learnable.)

*If A is inside B and B is at C, A is also at C
*If A is at C and B is at C, A is with B and B is with A
*If A moves to B, A is in/at B
*If A is over B and A falls, A is on/in B

A stamp from the Principality of Liechtenstein, commemorating air mail.

To try them out I wrote a new story -- a highly abbreviated retelling of "Prisoner of the Sand," from Wind, Sand, and Stars by Antoine de Saint-Exupéry. I had written up a version of this clear back when I started work on the Narrative module -- I was looking for man vs. environment stories, and it seemed like a good counterpoint for "To Build A Fire." But I realized at the time that it would be pretty hard to understand without some spatial reasoning tools, and set it aside. Here's the story:

Antoine was a pilot.
Antoine was in an airplane.
The airplane was over a desert.
The airplane crashed.
The airplane was broken.
Antoine left the airplane.
Antoine was thirsty.
Antoine expected to dehydrate.
Antoine decided to drink some water.
Antoine did not have any water.
Antoine could not get water in the desert.
Antoine wanted to leave the desert.
Antoine walked.
Antoine could not leave the desert without a vehicle.
Antoine found footprints.
Antoine followed the footprints.
Antoine found a nomad.
The nomad had water.
The nomad gave the water to Antoine.
Antoine drank the water.
The nomad took Antoine to a car.
Antoine entered the car.
The car left the desert.
The end.

With the help of a taught conditional that says "airplane crashes <implies> airplane falls," plus the spatial inferences, Acuitas gets all the way from "The airplane crashed" to "Antoine is in the desert now" without intervening explanations. In similar fashion, when the car leaves the desert it is understood that it takes Antoine with it, so that his desire to leave is fulfilled. "Can't ... without a vehicle" is also significant; the need to possess or be with a vehicle is attached to the goal "leave the desert" as a prerequisite, which is then recognized as being fulfilled when Antoine is taken to the car.

The older inventory reasoning is also in use: when Antoine is given water, it is inferred that he has water. This satisfies a prerequisite on the goal "drink water."

There's a lot more to do with this, but I'm happy with where I've gotten so far.

Until the next cycle,


Wednesday, August 18, 2021

Acuitas Diary #40 (August 2021 A)

I have a bit more theory to talk about than usual. That means you're getting a mid-month developer diary, so I can get the ideas out of the way before describing what I did with them.

I've wanted to start working on spatial reasoning for a while now. At least a rough understanding of how space works is important for comprehending human stories, because we, of course, live in space. I already ran into this issue (and put in hacks to sidestep it) in a previous story: Horatio the robot couldn't reach something on a high shelf. Knowing what this problem is and how to solve it calls for a basic comprehension of geometry.

A page from Harmonices Mundi by Johannes Kepler

Issue: Acuitas does *not* exist in physical space -- not really. Of course the computer he runs on is a physical object, but he has no awareness of it as such. There are no sensors or actuators; he cannot see, touch, or move. Nor does he have a simulated 3D environment in which to see, touch, and move. He operates on words. That's it.

There's a school of thought that says an AI of this type simply *can't* understand space in a meaningful way, on account of having no direct experience of it or ability to act upon it. It is further claimed that symbols (words or numbers) are meaningless if they cannot refer to the physical, that this makes reasoning by words alone impossible, and therefore I'm an idiot for even attempting an AI that effectively has no body. Proponents of this argument sometimes invoke the idea that "Humans and animals are the only examples of general intelligence we have; they're all embodied, and their cognition seems heavily influenced by their bodies." Can you spot the underlying worldview assumption? [1]

Obviously I don't agree with this. It's my opinion that the concepts which emerge from human experience of space -- the abstractions that underlie or analogize to space, and which we use as an aid to understanding it -- are also usable by a symbolic reasoning engine, and possess their own type of meaningful reality. An AI that only manipulates ideas is simply a different sort of mind, not a useless one, and can still relate to us via those ideas that resemble our environment.

So how might this work in practice? How to explain space to an entity that has never lived in it?

Option #1: Space as yet another collection of relationships

To an isolated point object floating in an otherwise empty space, the space doesn't actually matter. Distance and direction are uninteresting until one can specify the distance and direction to something else. So technically, everything we need to know about space can be expressed as a graph of relationships between its inhabitants. Here are some examples, with the relational connection in brackets:

John [is to the left of] Jack.
Colorado [is north of] New Mexico.
I [am under] the table.
The money [is inside] the box.

For symbolic processing purposes, these are no more difficult to handle than other types of relationship, like category ("Fido [is a] dog") and state ("The food [is] cold"). An AI can make inferences from these relationships to determine the actions possible in a given scenario, and in turn, which of those actions might best achieve some actor's goals.

Though the relationship symbols are not connected to any direct physical experience -- the AI has never seen what "X inside Y" looks like -- the associations between this relationship and possible actions remain non-arbitrary. The AI could know, for instance, that if the money is inside a box, and the box is closed, no one can remove the money. If the box is moved, the money inside it will move too. These connections to other symbols like "move" and "remove" and "closed" supply a meaning for the symbol "inside."

To prevent circular definitions (and hence meaninglessness), at least some of the symbols need to be tied to non-symbolic referents ... but sensory experiences of the physical are not the only possible referents! Symbols can also represent (be grounded in) abstract functional aspects of the AI itself: processes it may run, internal states it may have, etc. Do this right, and you can establish chains of connection between spatial relationships like "inside" and the AI's goals of being in a particular state or receiving a particular text input. At that point, the word "inside" legitimately means something to the AI.

But let's suppose you found that confusing or unconvincing. Let's suppose that the blind, atactile, immobile AI must somehow gain first-hand experience of spatial relationships before it can understand them. This is still possible.

The relationship "inside" is again the easiest example, because any standard computer file system is built on the idea of "inside." Files are stored inside directories which can be inside other directories which are inside drives. 

The file system obeys many of the same rules as a physical cabinet full of manila folders and paper. You have to "open" or "enter" a directory to find out what's in it. If you move directory A inside directory B, all the contents of directory A also end up inside directory B. But if you thought that this reflected anything about the physical locations of bits stored on your computer's hard drive, you would be mistaken. A directory is not a little subregion of the hard disk; the files inside it are not confined within some fixed area. Rather, the "inside-ness" of a file is established by a pointer that connects it to the directory's name. In other words, the file system is a relational abstraction!

File systems can be represented as text and interrogated with text commands. Hence a text-processing AI can explore a file system. And when it does, the concept of "inside" becomes directly relevant to its actions and the input it receives in response ... even though it is not actually dealing with physical space.

Though a file system doesn't belong to our physical environment, humans find it about as easy to work with as a filing cabinet or organizer box. Our experience with these objects provides analogies that we can use to understand the abstraction.

So why couldn't an AI use direct experience with the abstraction to understand the objects?

And why shouldn't the abstract or informational form of "inside-ness" be just as valid -- as "real" -- as the physical one?

Option #2: Space as a mathematical construct

All of the above discussion was qualitative rather than quantitative. What if the AI ends up needing a more precise grasp of things like distances and angles? What if we wanted it to comprehend geometry? Would we need physical experience for that?

It is possible to build up abstract "spaces" starting from nothing but the concepts of counting numbers, sets, and functions. None of these present inherent difficulties for a symbolic AI. Set membership is very similar to the category relationship ("X [is a] Y") so common in semantic networks. And there are plenty of informational items a symbolic AI can count: events, words, letters, or the sets themselves. [2] When you need fractional numbers, you can derive them from the counting numbers.

An illustration of a Cartesian coordinate system applied to 3D Euclidean space.

Keeping in mind that I'm not a mathematician by trade and thus not yet an expert on these matters, consider the sorts of ingredients one needs to build an abstract space:

1. A set of points that belong to the space. A "point" is just a number tuple, like (0, 3, 5, 12) or (2.700, 8.325). Listing all the points individually is not necessary -- you can specify them with rules or a formula. So the number of points in your space can be infinite if needed. The number of members in each point tuple gives the space's dimension.

2. A mathematical function that can accept any two points as inputs and produce a single number as output. This function is called the metric, and it provides your space's concept of distance.

3. Vectors, which introduce the idea of direction. A vector can be created by choosing any two points and designating one as the head and the other as the tail. If you can find a minimal list of vectors that are unrelated to each other and can be used to compose any other possible vector in the space, then you can establish cardinal directions.

Notice that none of this requires you to see anything, touch anything, or move anything. It's all abstract activity: specifying, assigning, calculating. Using these techniques, you can easily build an idea-thing that happens to mimic the Euclidean 3D space that humans live in (though many other spaces, some of which you could not even visualize, are also possible). And once you've done that, you are free to construct all of geometry.

I'd like to eventually equip Acuitas with the tools to apply both Option #1 and Option #2. I'm starting with Option #1 for now. Tune in next time to see what I've accomplished so far.

[1] For a few examples of the "AI must be embodied" argument, see,, and

[2] See "Do Natural Numbers Need the Physical World?," from The Road to Reality by Roger Penrose. Excerpts and a brief summary of his argument are available here: "There are various ways in which natural numbers can be introduced in pure mathematics and these do not seem to depend upon the actual nature of the physical universe at all."

Saturday, July 31, 2021

Acuitas Diary #39 (July 2021)

First on the worklist for this month was some improved reasoning about action conditions -- specifically, which things need to be true for someone to do an action (prerequisites) and which things, if true, will prevent the action (blockers). Technically, it was already somewhat possible for Acuitas to manage reasoning like this -- ever since I expanded the C&E database to handle "can-do" and "cannot-do" statements, he could be taught conditionals such as "If <fact>, an agent cannot <action>." But the idea of prerequisites and blockers seems to me fundamental enough that I wanted to make it more explicit and introduce some specific systems for handling it.

This was a lot of groundwork that should make things easier in the future, but didn't produce many visible results. The one thing I did get out of it was some improved processing of the "Odysseus and the Cyclops" story. My version of the story contains this line near the end:

"Polyphemus could not catch Odysseus."

Your average human would read that and know immediately that Polyphemus' plan to eat Odysseus has been thwarted. But for Acuitas before now, it was something of a superfluous line in the story. I had to include "Odysseus was not eaten." after it to make sure he got the point ... and though he recorded Odysseus' problem as being solved, he never actually closed out Polyphemus' goal, which caused him to sometimes complain that the story was "unfinished."

With the new prerequisite machinery, these problems are solved. I dropped a conditional in the C&E database: if an agent cannot catch someone, the agent does not have them. And the action "eat" carries a prerequisite that, to eat <item>, you must first have <item>. The new prerequisite-checking functions automatically conclude that Polyphemus' goal is now unachievable, and update it accordingly.

Project #2 was more benchmarking for the Parser. I finished putting together my second childrens' book test set, consisting of sentences from Tron Legacy tie-in picture book Out of the Dark. The Parser's initial "correct" score was around 25%. By adding some common but previously-unknown words (like "against" and "lying") and hints about their usual part-of-speech to Acuitas' database, I was able to improve the score to about 33% ... very close to last month's score on The Magic School Bus: Inside the Earth.

One of the most common errors I saw was failure to distinguish prepositional adverbs from regular prepositions. In case you're not familiar with the two, here are some examples:

Sentences with prepositions used as such:

I climbed up the ladder.
He jumped out the window.
The captain is on the deck.
Down the stairs she went.

Sentences with prepositions used as adverbs:

Hot air makes the balloon go up.
He threw the spoiled food out.
Turn on the light.
Down came the porcelain vase.

The parser by default was treating each word as either a preposition only or an adverb only, depending on which usage was marked as more common. So I added some procedures for discriminating based on its position and other words in the sentence. (The one construction that's still tricky is "Turn on the light" ... I think I know how to handle this one, but need to implement tracking of transitive and intransitive verbs first.) With the help of these new features I was able to get both test sets scoring over 40% correct.

GraphViz sentence diagram key. Words are color-coded by part of speech.

I also downloaded Graphviz and wrote code to convert my parser outputs into Graphviz' input language, producing actual sentence diagrams in the form of graphs (which is non-traditional but works). This makes it much easier to visualize similarities and differences between the parser's output and the human-understood structure of a sentence. With that available, I now present the Acuitas Text Parser's first public benchmark results on the two aforementioned test sets. Each ZIP contains a text file with parser output and unparsed/incorrect/correct breakdowns, and a PDF of golden/actual sentence diagrams for all sentences on which parsing was attempted.

Out Of the Dark - Acuitas Parser Results 07-31-2021
The Magic School Bus: Inside the Earth - Acuitas Parser Results 07-31-2021

The text of The Magic School Bus: Inside the Earth is copyright 1987 to Joanna Cole, publisher Scholastic Inc. Out of the Dark, by Tennant Redbank, is copyright 2010 to Disney Enterprises Inc. Text from these works is reproduced as part of the test results under Fair Use for research purposes. I.e. it's only here so you can see how good my AI is at reading real human books. If you want to read the books yourself, please go buy them.

I'll throw some highlights on the blog. Here's a complex sentence with a dependent clause that the parser gets right:

And here's one where it gets lost in the weeds:

Here's a remaining example of an adverb being mistaken for a preposition:

And here's a prepositional phrase being mistaken for an infinitive:

Confusion about which word a phrase modifies:

Confusion about the variable role of "that," among other problems:

And here's another win, for the road:

Until the next cycle,


Tuesday, June 22, 2021

Acuitas Diary #38 (June 2021)

NOTE: The Feedburner e-mail subscription service is being sunset this month, so if you are subscribed to the blog by e-mail, this will be the last e-mailed blog post you receive. Please consider following directly with a Blogger account or following on social media.

This month marks the culmination of a major overhaul of the Text Parser and Interpreter, which I've been working on since the beginning of the year. As part of that, I have my first attempt at formal benchmarking to show off. I tested the Parser's ability to analyze sentences from a children's book.

Some quick background about these modules: the job of what I call the "Parser" is to take raw text input and turn it into the equivalent of a diagrammed sentence. It tags each word with its part of speech, its role in the sentence (subject, direct object, etc.), and its structural relationships to other words. The "Interpreter" operates on the Parser's output and tries to find meaning. Based on the sentence's discovered structure (and possibly some key words) it will categorize it as a general kind of statement, question, or imperative. For instance, "A cat is an animal" is a statement that establishes a type relationship. "I ate pizza" is a statement that describes an event.

My primary goal for the overhauls was not to add new features, but to pave their way by correcting some structural weaknesses. So despite being a great deal of work, they aren't very exciting to talk about ... I would have to get too deep into minutiae to really describe what I did. The Parser got rearchitected to ease the changing of its "best guess" sentence structure as new information arrives. I also completely changed the output format to better represent the full structure of the sentence (more on this later). The Interpreter overhaul was perhaps even more fundamental. Instead of trying to assign just one category per sentence, the Interpreter now walks a tree structure, finding very general categories of which the sentence is a member before progressing to more specific ones. All the memberships and feature tags that apply to the sentence are now included in the output, which should make things easier for modules like Narrative and Executive that need to know sentence properties.

Now on to the benchmarking! For a test set, I wanted some examples of simplified, but natural (i.e. not designed to be read by AIs) human text. So I bought children's books. I have two of the original Magic School Bus titles, and two of Disney's Tron Legacy tie-in picture books. These are all "early reader" books, but by the standards of my project they are still very challenging ... even here, the diversity and complexity of the sentences is staggering. So you might wonder why I didn't grab something even more entry-level. My reason is that books for even younger readers tend to rely too heavily on the pictures. Taken out of context, their sentences would be incomplete or not even interesting. And that won't work for Acuitas ... he's blind.

So instead I've got books that are well above his reading level, and early results from the Parser on these datasets are going to be dismal. That's okay. It gives me an end goal to work toward.

How does the test work? If you feed the Parser a sentence, such as "I deeply want to eat a pizza," as an output it produces a data structure like this:

{'subj': [{'ix': [0], 'token': 'i', 'mod': []}], 'dobj': [{'ix': [3, 4, 5, 6], 'token': {'subj': [{'ix': [], 'token': '<impl_rflx>', 'mod': []}], 'dobj': [{'ix': [6], 'token': 'pizza', 'mod': [{'ix': [5], 'token': 'a', 'mod': []}], 'ps': 'noun'}], 'verb': [{'ix': [4], 'token': 'eat', 'mod': []}], 'type': 'inf'}, 'mod': []}], 'verb': [{'ix': [2], 'token': 'want', 'mod': [{'ix': [1], 'token': 'deeply', 'mod': []}]}]}

Again, this is expressing the information you would need to diagram the sentence. It shows that the adverb "deeply" modifies the verb "want," that the infinitive phrase "to eat a pizza" functions as the main sentence's direct object, blah blah blah. To make a test set, I transcribe all the sentences from one of the books and create these diagram-structures for them. Then I run a script that inputs all the sentences to the Parser and compares its outputs with the diagram-structures I made. If the Parser's diagram-structure is an exact match for mine, it scores correct.

The Parser runs in a simulator/standalone mode for the test. This mode makes it independent of Acuitas' Executive and other main threads. The Parser still utilizes Acuitas' semantic database, but cannot edit it.

There are actually three possible score categories: "correct," "incorrect," and "unparsed." The "unparsed" category is for sentences which contain grammar that I already know the Parser simply doesn't support. (The most painful example: coordinating conjunctions. It can't parse sentences with "and" in them!) I don't bother trying to generate golden diagram-structures for these sentences, but I still have the test script shove them through the Parser to make sure they don't provoke a crash. This produces a fourth score category, "crashed," whose membership we hope is always ZERO. Sentences that have supported grammar but score "incorrect" are failing due to linguistic ambiguities or other quirks the Parser can't yet handle.

Since the goal was to parse natural text, I tried to avoid grooming of the test sentences, with two exceptions. The Parser does not yet support quotations or abbreviations. So I expanded all the abbreviations and broke sentences that contained quotations into two. For example, 'So everyone was happy when Ms. Frizzle announced, "Today we start something new."' becomes 'So everyone was happy when Miz Frizzle announced.' and 'Today we start something new.'

It is also worth noting that my Magic School Bus test sets only contain the "main plot" text. I've left out the "science reports" and the side dialogue between the kids. Maybe I'll build test sets that contain these eventually, but for now it would be too much work.

A pie chart showing results of the Text Parser benchmark on data set "The Magic School Bus: Inside the Earth." 37% Unattempted, 28% Incorrect, and 33% Correct.

On to the results!

So far I have fully completed just one test set, namely The Magic School Bus: Inside the Earth, consisting of 98 sentences. The Parser scores roughly one out of three on this one, with no crashes. It also parses the whole book in 0.71 seconds (averaged over 10 runs). That's probably not a stellar performance, but it's much faster than a human reading, and that's all I really want.

Again, dismal. But we'll see how this improves over the coming years!

Until the next cycle,