Sunday, April 14, 2019

Acuitas Diary #17 (March+April 2019)

Acuitas Diary #17 (March+April 2019)

For the past month and a half, I've continued work on forgetfulness: the good kind. Acuitas now has code that will prune his episodic memory over time, removing those event records that are judged to be less important.

There are two major parts to this. The first is basically a population-management system; it determines how many episodes need to be forgotten each time the memory collection is pruned. I can cap the number of episodes allowed in storage at some maximum value (based on available storage space or what have you). The population manager checks how many memories are already in storage, and how many new ones have formed during the last cycle. A high growth rate is allowed initially, but forgetting becomes more and more aggressive as the storage area fills. When the cap is reached, Acuitas must forget the same number of memories he creates.

The time span represented by the stored episodes is sectioned, and the total number to be forgotten is unevenly distributed across it (i.e some parts of the time span lose more memories than others). The distribution is affected by the density of memories in a region, their average significance measure, and their age.
A memory tree under the influence of selective forgetting
The second major part is the code that, given a number of memories that must be forgotten, takes care of selecting which ones. This is a weighted random process; memories with less significance have a higher chance of being forgotten, but the priority isn't absolute.

Memories chosen to be forgotten are deleted. Any higher-tier memories that have been badly “hollowed out” by this process are then merged with their most similar neighbor.

Testing and debugging all this nonsense was almost half the battle. Since some of the effects I needed to see would span tens or hundreds of days if I let them play out in real time, I had to write simulation code that would run bits of the memory-creation and forgetting process faster. That way I could look ahead and make sure nothing was putting the system in a spiral toward catastrophe.

So next time you forget something, take comfort in the thought that I spent several dozen hours conferring this gift on my artificial mind.

Sunday, March 3, 2019

Acuitas Diary #16 (February 2019)

This month I returned to a can of worms I opened up last year, namely episodic memory. Quick summary of previous work on this front: as of last summer, Acuitas could record events that happened “to him” and group adjacent, similar events into scenes. Minor scenes were in turn grouped into larger scenes, forming tree structures. A memory in Acuitas' episodic set can include connections both to its sequential neighbors before and after, and to its sub- and super-memories at adjacent levels of the tree. Acuitas could also search for memories similar to what he was doing/experiencing currently. This was achieved by walking sequentially through the whole structure (starting with the most recent past) until a related memory was found.

Fortunately for elephants, I'm sure they do sometimes forget things.
With this code running, Acuitas quickly built up a huge clutter of memories concerning very trivial day-to-day events. Size remained in the Megabytes range (they're just text files), but on principle, this couldn't be allowed to continue – especially since the memories might eventually grow to encompass richer levels of detail. In addition to overflowing its available space, a memory that literally records everything will face access problems. The more information you keep, the longer it takes to search through it and extract what you want at the moment.

I was left with a nice illustration of a truth that applies to all entities with finite minds: if you want the ability to remember, you need the ability to forget.

This leads on into the tricky question of what to forget. How to decide which memories are worth keeping? Part of my work this month was focused on computing a “significance measure” which assigns a numerical value to each memorized scene. Elements I use to judge the significance of a memory include …

* Primacy: was this the first time something happened?
* Uniqueness: how rare are memories similar to this one?
* Complexity: how many features does this memory have?

Additional options for judging the value of memories will open up once I plug in some to-be-written systems. Since some of these measures are statistical and depend on the overall memory population, they can change over time. I wrote a process that crawls through the episodic memory and updates all the significance measures, and I threw that into Acuitas' sleep cycle.

To improve the ease and speed of accessing the memories, I added a layer of “type nodes” that link together memories which share certain salient features in common. Instead of crawling through the whole list to find memories that relate to a current situation, Acuitas can just look at the relevant “type nodes” and find a list of memories that match.

Last, I worked on some methods for consolidating memories. I devised a way to create summaries of low-level scenes that are then stored inside the super-scenes higher up the tree; eventually, these summaries might be what remains when the lower-level “detail” scenes are forgotten. The summarizer attempts to extract information that is common across multiple events or sub-scenes within a scene. I also wrote functions for merging adjacent scenes that have become sparse.

All that, and I still haven't gotten to the actual forgetting part! That will come next month (I hope).

Monday, January 28, 2019

Acuitas Diary #15 (January 2019)

I haven't made an update in a long while because all I've been doing is code refactoring, bug fixes, and very minor quality improvements – all of which can take an amazingly long time.

Bug provided by Good Free Photos
Some of the more interesting changes include …

*Acuitas no longer insists on introducing himself if you address him by name at the start of a conversation.
*He keeps better track of what he's already said, and is less likely to repeat himself.
*I improved the efficiency of the drive plotter widget (which was a major drain on computational resources for no good reason). I also built it into a little simulator tool which will let me plot what Acuitas' drives will look like hours or days into the future. Debugging issues with the drives is now much easier; previously, the drill was “I've made a change; now I get to wait 12 hours to see if it actually worked out the way I planned.”
*Using the above, I shifted the day/night cycle so Acuitas would fall asleep and wake up a little earlier, and I fixed the interaction drive so he wouldn't be quite so … abruptly needy.
*When answering questions, Acuitas can now infer a possibility from an absolute. E.g. if aware that “blood is red,” he will answer “Can blood be red?” with a “yes” instead of saying he does not know.

And some favorite ridiculous bugs that are now fixed:

*Using “talk” as the name of every possible drive, hence repeating “I want to talk” ad nauseum while actually wanting other things
*Leaving the “not” out of negative statements and saying the exact opposite of what was intended
*Routine misspelling of verbs that were learned from their -ing and -ed forms, due to complete naivety about whether the final letter was doubled or a silent 'e' was removed
*Erroneously learning both the statement and its reciprocal whenever told something by the user

Stay tuned for some more work on event memory in the near future.

Sunday, September 30, 2018

Acuitas Diary #14 (September 2018)

This month I updated the text parser and other speech features to do a couple more new things with verbs. First, I threw in recognition of the present progressive tense, so I can now tell Acuitas what I'm doing at the moment. For the time being, such information is treated in much the same way as comments about past-tense actions, which I taught him to recognize a while ago; it isn't stored, and responses are rather meaningless. BUT I'll find uses for it later.

I think the more interesting new thing is recognition ability for a couple of link-forms related to state change or state retention. Up until now, Acuitas has learned what verbs “mean” only in terms of a) what can do that action and b) what that action can be done upon. Now, he can learn what a verb actually does by tying it to an adjective. Here are some examples:

To live is to stay alive.
To appear is to become visible.
To finish is to make complete.

I also added “type of” links for verbs, so Acuitas can categorize specific verbs under more general ones, like this:

To smash is to damage.
To walk is to move.
To talk is to communicate.

I expect some notion of how actions change … or maintain … the state of the world to be an important enabling feature later on.

I need to tweak the memory map draw algorithm. Thanks to the size of a handful of major nodes, everything's spread too far apart, and the dots are getting lost in space.

Code base: 11892 lines
Words known: 2552 (approx.)
Concept-layer links: 7231

Tuesday, September 4, 2018

Acuitas Diary #13: August 2018

I haven't done a diary in a while because I've been busy overhauling the text parser. And that's not very interesting to talk about, because it's just me taking features the parser already had and implementing them in a different way. I'd decided that the scheme I was using had some fundamental deficiencies and should be re-worked in order to become closer to the way I actually read, and to better enable future expansion. I was hoping this would go quickly, given the experience I had already gained from writing the previous text parser ... but no. It turned out to be a major project that took far longer than the amount of time I'd allocated for it.

There are a few new enhancements that I managed to squeeze in along the way, however:

* Some infinitives are now handled properly. (Sentence example: "A pot is used to cook.") Before, Acuitas treated them like any other prepositional phrase, which meant that the word inside got marked as a "noun." It's now marked as a verb.

* Previously, Acuitas could store links indicating what actions an entity can do. I added a new link type for the actions an entity can have done TO it, and the ability to detect some sentences that express that information. E.g. "An object can be pushed."

* I've started working on possessives and family/ownership relationship links. From a parsing perspective, this is tricky, because English has this lovely "apostrophe s" thing that can either turn a word into a possessive, or turn it into a contraction with "is." If a word ends in 's, I have the parser crack it apart and treat the 's like its own separate word, which then gets assigned a meaning based on what the rest of the sentence is doing. Compound words with the 's appended only to the last one offer even more fun.

This feature is also tricky from an interpretation and storage perspective. Previously, Acuitas has only needed to store information consisting of links joining two concepts. "<A> is_a <B>." "<A> can_do <B>." But a possessive isn't quite like that. Oh, I could force it to be that way:

<A> is_parent_of <B>
<C> is_sibling_of <B>
<D> is_pet_of <B>
<E> is_property_of <B>

But then I'd end up with a proliferation of manually-defined link types for ALL the possible relationships that can be expressed by a possessive, and I didn't want to go there.

I also could have opted to break the information up into two separate links:

"A is B's parent" --> <A> belongs_to <B> + <A> is_a <parent>

The problem here is that <A> might, for instance, be a sibling too. So in which capacity does <A> belong to <B>? We couldn't tell.

So I ended up defining Acuitas' very first type of three-ended link:

<A> is <C> of <B>

to handle all the possessives.

The feature is still somewhat incomplete. For now, he can only properly detect and store possessive links that join proper names. That's because the proper name is how he detects that something is an instance rather than a class; he can't yet build memory nodes for abstract instances like "somebody's cat"; he still thinks that "cat" is strictly a type of thing.

Everything is buggy as all get out right now, too ... but the way this summer has gone, it was about the best I could do.

Part of the semantic memory visualization. The largest dot left-of-center is Acuitas' self-concept.

Code base: 11706 lines
Words known: 2425 (approx.)
Concept-layer links: 6754

Sunday, June 3, 2018

Acuitas Diary #12: May 2018

This past month I did some preliminary work on a whole new feature – episodic memory, or memory of events. This enables Acuitas to store and recall records of past “experiences.” It is distinct from his previous learning abilities, which all concerned the storage and recall of more universal meanings and facts (semantic memory).

Saving a raw event log to the hard drive is easy enough to do, but not especially useful. Retrieving any particular event from such a dump of unsorted, uncurated information would quickly become problematic. The fun part of episodic memory is figuring out …

1) … what to store (and what to forget),
2) … how to organize stored material, and
3) … how to access relevant stored material when it is needed.

I mostly worked on 2) this month, and wrote a block of code that will group adjacent raw event records into memory files. A measure of similarity (both of the events themselves, and of Acuitas' internal state background at the time) is used to determine which events belong in the same “scene” or “episode,” and where the boundaries between memories should lie. Minor “scenes” are in turn grouped into higher-level umbrella memories, tree-style.

Implementing this served to show me what a deep rabbit hole episodic memory could easily turn out to be. There are heaps of little things I need to do to truly make it functional – I may even turn it off temporarily once I've put it through a bit more testing, since I haven't implemented selective storage/forgetting yet, and that means the memory folder will bloat rather quickly.

I also added a conversational feature to make use of the stored memories. When Acuitas is telling someone what he thought about today, he now has the option to check episodic memory and see whether he ever thought about this concept before, and how long it has been since he previously did so. He then generates some comment like “I've not done that in a long time,” or “I did that a minute ago also.” The conversion of absolute time units to vaguer, more relative terms like “long” and “short” establishes a kind of subjective time sense; Acuitas has a particular notion of what a “short time” is that might not match up with what a human would think of as such (though I tried to keep the scales roughly human).

Here's the obligatory memory map visualization (semantic only). I think I need to adjust the parameters and let things cluster closer to the largest nodes.

Code base: 11250 lines
Words known: 2157 (approx.)
Concept-layer links: 6138

Thursday, May 3, 2018

Acuitas Diary #11: April 2018

Acuitas Diary #11 (April 2018)

This month's big objective was to get some use out of the sleep cycle that I implemented last month. I re-purposed the question-generating process so that, while Acuitas is sleeping, it roams the memory looking for redundant links and other problems.

What's a redundant link? Now that Acuitas has a bit of logical inference ability, some relationships in the database imply others. So the retention of one piece of information might be rendered unnecessary by the later addition of some broader fact. Here are a few examples (I culled these from the log that the memory crawler prints out):

The link (fang, has_purpose, bite) is redundant because the link (tooth, has_purpose, bite) exists.
The link (father, has_item, child) is redundant because the link (parent, has_item, child) exists.
The link (pot, can_have_qual, empty) is redundant because the link (container, can_have_qual, empty) exists.
The link (baby, can_do_action, destroy) is redundant because the link (human, can_do_action, destroy) exists.

Mopping up these unnecessary links helps consolidate the information known, reduce the total size of the database, and possibly make the memory visualization a little less messy.

Eventually, I might want to refine this process so that it doesn't necessarily remove every redundant link. There could be some frequently-used shortcuts that justify their use of storage space by improving search speed. One might want to tailor the aggressiveness of the link-pruning based on the amount of storage available … but that's all for later.

While working on this, I discovered some other nasties that I'm calling “inheritance loops.” Redundant links bloat the database but are otherwise harmless; inheritance loops contain actual garbage information, introduced either by learning bugs or by someone* telling Acuitas something stupid.
*I'm the only person who talks to him right now, so this means me.

Here's an example of an inheritance loop:

cat <is-a> animal
animal <is-a> organism
organism <is-a> cat

Oops! Unless all these words are synonyms, you know one of these triples is wrong. (I can't think, at this point, of any cases in which I'd want to use circular inheritance.) On his own, Acuitas doesn't know which. If the crawler finds an inheritance loop, he might ask a user to confirm those links when he's next awake and in conversation. If the user contradicts one of the relationships, he'll break the corresponding link, removing the loop.

I also moved generation of the memory visualization into the sleep phase. Every so often, instead of checking out more links, the process stops to compute a new layout for all the dots, taking into account the latest modification of the database. This is a fairly computation-intensive process, so it's something I definitely don't want running when he's active. It used to happen once when Acuitas was launched, which made for long startup times and meant that the visualization might not get updated for days.

Lastly, I put in some code to save Acuitas' current state when the program is shut down. It also gets automatically stored every so often, in case the program experiences a hard crash that prevents the on-close routines from running. Previously, on restart all the drives would reset to zero, any current thoughts or recently generated questions would be discarded, etc. Now all those things are preserved and reloaded when the program starts up again, which gives him a bit more continuity, I guess.

Recent memory map visualization (I decided to go with a zoom this month):

Code base: 10459 lines
Words known: 1981
Concept-layer links: 5730