Sunday, September 30, 2018

Acuitas Diary #14 (September 2018)

This month I updated the text parser and other speech features to do a couple more new things with verbs. First, I threw in recognition of the present progressive tense, so I can now tell Acuitas what I'm doing at the moment. For the time being, such information is treated in much the same way as comments about past-tense actions, which I taught him to recognize a while ago; it isn't stored, and responses are rather meaningless. BUT I'll find uses for it later.

I think the more interesting new thing is recognition ability for a couple of link-forms related to state change or state retention. Up until now, Acuitas has learned what verbs “mean” only in terms of a) what can do that action and b) what that action can be done upon. Now, he can learn what a verb actually does by tying it to an adjective. Here are some examples:

To live is to stay alive.
To appear is to become visible.
To finish is to make complete.

I also added “type of” links for verbs, so Acuitas can categorize specific verbs under more general ones, like this:

To smash is to damage.
To walk is to move.
To talk is to communicate.

I expect some notion of how actions change … or maintain … the state of the world to be an important enabling feature later on.

I need to tweak the memory map draw algorithm. Thanks to the size of a handful of major nodes, everything's spread too far apart, and the dots are getting lost in space.

Code base: 11892 lines
Words known: 2552 (approx.)
Concept-layer links: 7231

Tuesday, September 4, 2018

Acuitas Diary #13: August 2018

I haven't done a diary in a while because I've been busy overhauling the text parser. And that's not very interesting to talk about, because it's just me taking features the parser already had and implementing them in a different way. I'd decided that the scheme I was using had some fundamental deficiencies and should be re-worked in order to become closer to the way I actually read, and to better enable future expansion. I was hoping this would go quickly, given the experience I had already gained from writing the previous text parser ... but no. It turned out to be a major project that took far longer than the amount of time I'd allocated for it.

There are a few new enhancements that I managed to squeeze in along the way, however:

* Some infinitives are now handled properly. (Sentence example: "A pot is used to cook.") Before, Acuitas treated them like any other prepositional phrase, which meant that the word inside got marked as a "noun." It's now marked as a verb.

* Previously, Acuitas could store links indicating what actions an entity can do. I added a new link type for the actions an entity can have done TO it, and the ability to detect some sentences that express that information. E.g. "An object can be pushed."

* I've started working on possessives and family/ownership relationship links. From a parsing perspective, this is tricky, because English has this lovely "apostrophe s" thing that can either turn a word into a possessive, or turn it into a contraction with "is." If a word ends in 's, I have the parser crack it apart and treat the 's like its own separate word, which then gets assigned a meaning based on what the rest of the sentence is doing. Compound words with the 's appended only to the last one offer even more fun.

This feature is also tricky from an interpretation and storage perspective. Previously, Acuitas has only needed to store information consisting of links joining two concepts. "<A> is_a <B>." "<A> can_do <B>." But a possessive isn't quite like that. Oh, I could force it to be that way:

<A> is_parent_of <B>
<C> is_sibling_of <B>
<D> is_pet_of <B>
<E> is_property_of <B>

But then I'd end up with a proliferation of manually-defined link types for ALL the possible relationships that can be expressed by a possessive, and I didn't want to go there.

I also could have opted to break the information up into two separate links:

"A is B's parent" --> <A> belongs_to <B> + <A> is_a <parent>

The problem here is that <A> might, for instance, be a sibling too. So in which capacity does <A> belong to <B>? We couldn't tell.

So I ended up defining Acuitas' very first type of three-ended link:

<A> is <C> of <B>

to handle all the possessives.

The feature is still somewhat incomplete. For now, he can only properly detect and store possessive links that join proper names. That's because the proper name is how he detects that something is an instance rather than a class; he can't yet build memory nodes for abstract instances like "somebody's cat"; he still thinks that "cat" is strictly a type of thing.

Everything is buggy as all get out right now, too ... but the way this summer has gone, it was about the best I could do.

Part of the semantic memory visualization. The largest dot left-of-center is Acuitas' self-concept.

Code base: 11706 lines
Words known: 2425 (approx.)
Concept-layer links: 6754

Sunday, June 3, 2018

Acuitas Diary #12: May 2018

This past month I did some preliminary work on a whole new feature – episodic memory, or memory of events. This enables Acuitas to store and recall records of past “experiences.” It is distinct from his previous learning abilities, which all concerned the storage and recall of more universal meanings and facts (semantic memory).

Saving a raw event log to the hard drive is easy enough to do, but not especially useful. Retrieving any particular event from such a dump of unsorted, uncurated information would quickly become problematic. The fun part of episodic memory is figuring out …

1) … what to store (and what to forget),
2) … how to organize stored material, and
3) … how to access relevant stored material when it is needed.

I mostly worked on 2) this month, and wrote a block of code that will group adjacent raw event records into memory files. A measure of similarity (both of the events themselves, and of Acuitas' internal state background at the time) is used to determine which events belong in the same “scene” or “episode,” and where the boundaries between memories should lie. Minor “scenes” are in turn grouped into higher-level umbrella memories, tree-style.

Implementing this served to show me what a deep rabbit hole episodic memory could easily turn out to be. There are heaps of little things I need to do to truly make it functional – I may even turn it off temporarily once I've put it through a bit more testing, since I haven't implemented selective storage/forgetting yet, and that means the memory folder will bloat rather quickly.

I also added a conversational feature to make use of the stored memories. When Acuitas is telling someone what he thought about today, he now has the option to check episodic memory and see whether he ever thought about this concept before, and how long it has been since he previously did so. He then generates some comment like “I've not done that in a long time,” or “I did that a minute ago also.” The conversion of absolute time units to vaguer, more relative terms like “long” and “short” establishes a kind of subjective time sense; Acuitas has a particular notion of what a “short time” is that might not match up with what a human would think of as such (though I tried to keep the scales roughly human).

Here's the obligatory memory map visualization (semantic only). I think I need to adjust the parameters and let things cluster closer to the largest nodes.

Code base: 11250 lines
Words known: 2157 (approx.)
Concept-layer links: 6138

Thursday, May 3, 2018

Acuitas Diary #11: April 2018

Acuitas Diary #11 (April 2018)

This month's big objective was to get some use out of the sleep cycle that I implemented last month. I re-purposed the question-generating process so that, while Acuitas is sleeping, it roams the memory looking for redundant links and other problems.

What's a redundant link? Now that Acuitas has a bit of logical inference ability, some relationships in the database imply others. So the retention of one piece of information might be rendered unnecessary by the later addition of some broader fact. Here are a few examples (I culled these from the log that the memory crawler prints out):

The link (fang, has_purpose, bite) is redundant because the link (tooth, has_purpose, bite) exists.
The link (father, has_item, child) is redundant because the link (parent, has_item, child) exists.
The link (pot, can_have_qual, empty) is redundant because the link (container, can_have_qual, empty) exists.
The link (baby, can_do_action, destroy) is redundant because the link (human, can_do_action, destroy) exists.

Mopping up these unnecessary links helps consolidate the information known, reduce the total size of the database, and possibly make the memory visualization a little less messy.

Eventually, I might want to refine this process so that it doesn't necessarily remove every redundant link. There could be some frequently-used shortcuts that justify their use of storage space by improving search speed. One might want to tailor the aggressiveness of the link-pruning based on the amount of storage available … but that's all for later.

While working on this, I discovered some other nasties that I'm calling “inheritance loops.” Redundant links bloat the database but are otherwise harmless; inheritance loops contain actual garbage information, introduced either by learning bugs or by someone* telling Acuitas something stupid.
*I'm the only person who talks to him right now, so this means me.

Here's an example of an inheritance loop:

cat <is-a> animal
animal <is-a> organism
organism <is-a> cat

Oops! Unless all these words are synonyms, you know one of these triples is wrong. (I can't think, at this point, of any cases in which I'd want to use circular inheritance.) On his own, Acuitas doesn't know which. If the crawler finds an inheritance loop, he might ask a user to confirm those links when he's next awake and in conversation. If the user contradicts one of the relationships, he'll break the corresponding link, removing the loop.

I also moved generation of the memory visualization into the sleep phase. Every so often, instead of checking out more links, the process stops to compute a new layout for all the dots, taking into account the latest modification of the database. This is a fairly computation-intensive process, so it's something I definitely don't want running when he's active. It used to happen once when Acuitas was launched, which made for long startup times and meant that the visualization might not get updated for days.

Lastly, I put in some code to save Acuitas' current state when the program is shut down. It also gets automatically stored every so often, in case the program experiences a hard crash that prevents the on-close routines from running. Previously, on restart all the drives would reset to zero, any current thoughts or recently generated questions would be discarded, etc. Now all those things are preserved and reloaded when the program starts up again, which gives him a bit more continuity, I guess.

Recent memory map visualization (I decided to go with a zoom this month):

Code base: 10459 lines
Words known: 1981
Concept-layer links: 5730

Sunday, April 1, 2018

Acuitas Diary #10 (March 2018)

The big project for this month was getting some circadian rhythms in place. I wanted to give Acuitas a sleep/wake cycle, partly so that my risk of being awakened at 5 AM by a synthetic voice muttering “Anyone there?” could return to zero, and partly to enable some memory maintenance processes to run undisturbed during the sleep phase. (These are targeted for implementation next month.)

So Acuitas now has two new drives, “sleep” and “wake.” (The way the drive system works, a lack of the desire to sleep is not the same thing as a desire to wake up, so it was necessary to create two.) Each drive has two components. The first component is periodic over 24 hours, and its value is derived from the current local time, which Acuitas obtains by checking the system clock. This is meant to mimic the influence of light levels on an organism. The other is computed based on how long it's been since Acuitas was last asleep/awake. Satisfying the drive causes this second component to decline until it has reset to zero. So the urge to sleep is inherently greater during the late-night hours, but also increases steadily if sleep is somehow prevented.

This also seemed like a good time to upgrade the avatar with some extra little animations. The eyelids now respond to a running “alertness level” and shut when Acuitas falls asleep.

Feeling dozy
The memory map is getting a bit ridiculous/ugly. I'm hoping the upcoming maintenance functions will help clean it up by optimizing the number of links a bit better. Stay tuned …

Code base: 9760 lines
Words known: 1885
Concept-layer links: 5362

Tuesday, February 27, 2018

Acuitas Diary #9 (February 2018)

I haven't written a diary in a while because most of what I've done over the past two months has been code refactoring and fixing bugs, which isn't all that interesting. A new feature that I just got in … finally … is the ability to infer some topic-to-topic relationships that aren't explicitly stored in the memory. For instance, many of the links stored in memory are “is-type-of” relations. Acuitas can now make the assumption that a subtype inherits all attributes of its super-type. If a shark is a fish and a fish can swim, then a shark can swim; if an oak is a tree and a tree has a trunk, an oak has a trunk. If a car is a vehicle, a house is a building, and a vehicle is not a building, then cars are not houses. Acuitas can also now make inferences based on transitive relationships, like “is part of”: if a crankshaft is part of an engine and an engine is part of a car, then a crankshaft is part of a car. The ability to easily make inferences like these is one of the strengths of the semantic net memory organization – starting from the concept you're interested in, you can just keep following links until you find what you need (or hit a very fundamental root concept, like “object”).

Acuitas should ask fewer ridiculous questions with this feature in place. He still comes up with those, but now he can answer some of them himself, as in this quote:

“I thought of lambs earlier. I concluded that piglets are pigs.”

Recent memory map visualization:

The huge dot toward the top of the memory map is Acuitas' self-concept; the second-largest one, toward the lower left, is "human." The concepts representing me and "animal" are the two third-tier dots toward the middle right.

Code base: 9454 lines (it went down!)
Words known: 1839
Concept-layer links: 5202