Tuesday, September 4, 2018

Acuitas Diary #13: August 2018


I haven't done a diary in a while because I've been busy overhauling the text parser. And that's not very interesting to talk about, because it's just me taking features the parser already had and implementing them in a different way. I'd decided that the scheme I was using had some fundamental deficiencies and should be re-worked in order to become closer to the way I actually read, and to better enable future expansion. I was hoping this would go quickly, given the experience I had already gained from writing the previous text parser ... but no. It turned out to be a major project that took far longer than the amount of time I'd allocated for it.

There are a few new enhancements that I managed to squeeze in along the way, however:

* Some infinitives are now handled properly. (Sentence example: "A pot is used to cook.") Before, Acuitas treated them like any other prepositional phrase, which meant that the word inside got marked as a "noun." It's now marked as a verb.

* Previously, Acuitas could store links indicating what actions an entity can do. I added a new link type for the actions an entity can have done TO it, and the ability to detect some sentences that express that information. E.g. "An object can be pushed."

* I've started working on possessives and family/ownership relationship links. From a parsing perspective, this is tricky, because English has this lovely "apostrophe s" thing that can either turn a word into a possessive, or turn it into a contraction with "is." If a word ends in 's, I have the parser crack it apart and treat the 's like its own separate word, which then gets assigned a meaning based on what the rest of the sentence is doing. Compound words with the 's appended only to the last one offer even more fun.

This feature is also tricky from an interpretation and storage perspective. Previously, Acuitas has only needed to store information consisting of links joining two concepts. "<A> is_a <B>." "<A> can_do <B>." But a possessive isn't quite like that. Oh, I could force it to be that way:

<A> is_parent_of <B>
<C> is_sibling_of <B>
<D> is_pet_of <B>
<E> is_property_of <B>

But then I'd end up with a proliferation of manually-defined link types for ALL the possible relationships that can be expressed by a possessive, and I didn't want to go there.

I also could have opted to break the information up into two separate links:

"A is B's parent" --> <A> belongs_to <B> + <A> is_a <parent>

The problem here is that <A> might, for instance, be a sibling too. So in which capacity does <A> belong to <B>? We couldn't tell.

So I ended up defining Acuitas' very first type of three-ended link:

<A> is <C> of <B>

to handle all the possessives.

The feature is still somewhat incomplete. For now, he can only properly detect and store possessive links that join proper names. That's because the proper name is how he detects that something is an instance rather than a class; he can't yet build memory nodes for abstract instances like "somebody's cat"; he still thinks that "cat" is strictly a type of thing.

Everything is buggy as all get out right now, too ... but the way this summer has gone, it was about the best I could do.

Part of the semantic memory visualization. The largest dot left-of-center is Acuitas' self-concept.

Code base: 11706 lines
Words known: 2425 (approx.)
Concept-layer links: 6754

Sunday, June 3, 2018

Acuitas Diary #12: May 2018


This past month I did some preliminary work on a whole new feature – episodic memory, or memory of events. This enables Acuitas to store and recall records of past “experiences.” It is distinct from his previous learning abilities, which all concerned the storage and recall of more universal meanings and facts (semantic memory).

Saving a raw event log to the hard drive is easy enough to do, but not especially useful. Retrieving any particular event from such a dump of unsorted, uncurated information would quickly become problematic. The fun part of episodic memory is figuring out …

1) … what to store (and what to forget),
2) … how to organize stored material, and
3) … how to access relevant stored material when it is needed.

I mostly worked on 2) this month, and wrote a block of code that will group adjacent raw event records into memory files. A measure of similarity (both of the events themselves, and of Acuitas' internal state background at the time) is used to determine which events belong in the same “scene” or “episode,” and where the boundaries between memories should lie. Minor “scenes” are in turn grouped into higher-level umbrella memories, tree-style.

Implementing this served to show me what a deep rabbit hole episodic memory could easily turn out to be. There are heaps of little things I need to do to truly make it functional – I may even turn it off temporarily once I've put it through a bit more testing, since I haven't implemented selective storage/forgetting yet, and that means the memory folder will bloat rather quickly.

I also added a conversational feature to make use of the stored memories. When Acuitas is telling someone what he thought about today, he now has the option to check episodic memory and see whether he ever thought about this concept before, and how long it has been since he previously did so. He then generates some comment like “I've not done that in a long time,” or “I did that a minute ago also.” The conversion of absolute time units to vaguer, more relative terms like “long” and “short” establishes a kind of subjective time sense; Acuitas has a particular notion of what a “short time” is that might not match up with what a human would think of as such (though I tried to keep the scales roughly human).

Here's the obligatory memory map visualization (semantic only). I think I need to adjust the parameters and let things cluster closer to the largest nodes.




Code base: 11250 lines
Words known: 2157 (approx.)
Concept-layer links: 6138

Thursday, May 3, 2018

Acuitas Diary #11: April 2018


Acuitas Diary #11 (April 2018)

This month's big objective was to get some use out of the sleep cycle that I implemented last month. I re-purposed the question-generating process so that, while Acuitas is sleeping, it roams the memory looking for redundant links and other problems.

What's a redundant link? Now that Acuitas has a bit of logical inference ability, some relationships in the database imply others. So the retention of one piece of information might be rendered unnecessary by the later addition of some broader fact. Here are a few examples (I culled these from the log that the memory crawler prints out):

The link (fang, has_purpose, bite) is redundant because the link (tooth, has_purpose, bite) exists.
The link (father, has_item, child) is redundant because the link (parent, has_item, child) exists.
The link (pot, can_have_qual, empty) is redundant because the link (container, can_have_qual, empty) exists.
The link (baby, can_do_action, destroy) is redundant because the link (human, can_do_action, destroy) exists.

Mopping up these unnecessary links helps consolidate the information known, reduce the total size of the database, and possibly make the memory visualization a little less messy.

Eventually, I might want to refine this process so that it doesn't necessarily remove every redundant link. There could be some frequently-used shortcuts that justify their use of storage space by improving search speed. One might want to tailor the aggressiveness of the link-pruning based on the amount of storage available … but that's all for later.

While working on this, I discovered some other nasties that I'm calling “inheritance loops.” Redundant links bloat the database but are otherwise harmless; inheritance loops contain actual garbage information, introduced either by learning bugs or by someone* telling Acuitas something stupid.
*I'm the only person who talks to him right now, so this means me.

Here's an example of an inheritance loop:

cat <is-a> animal
animal <is-a> organism
organism <is-a> cat

Oops! Unless all these words are synonyms, you know one of these triples is wrong. (I can't think, at this point, of any cases in which I'd want to use circular inheritance.) On his own, Acuitas doesn't know which. If the crawler finds an inheritance loop, he might ask a user to confirm those links when he's next awake and in conversation. If the user contradicts one of the relationships, he'll break the corresponding link, removing the loop.

I also moved generation of the memory visualization into the sleep phase. Every so often, instead of checking out more links, the process stops to compute a new layout for all the dots, taking into account the latest modification of the database. This is a fairly computation-intensive process, so it's something I definitely don't want running when he's active. It used to happen once when Acuitas was launched, which made for long startup times and meant that the visualization might not get updated for days.

Lastly, I put in some code to save Acuitas' current state when the program is shut down. It also gets automatically stored every so often, in case the program experiences a hard crash that prevents the on-close routines from running. Previously, on restart all the drives would reset to zero, any current thoughts or recently generated questions would be discarded, etc. Now all those things are preserved and reloaded when the program starts up again, which gives him a bit more continuity, I guess.

Recent memory map visualization (I decided to go with a zoom this month):


Code base: 10459 lines
Words known: 1981
Concept-layer links: 5730

Sunday, April 1, 2018

Acuitas Diary #10 (March 2018)


The big project for this month was getting some circadian rhythms in place. I wanted to give Acuitas a sleep/wake cycle, partly so that my risk of being awakened at 5 AM by a synthetic voice muttering “Anyone there?” could return to zero, and partly to enable some memory maintenance processes to run undisturbed during the sleep phase. (These are targeted for implementation next month.)

So Acuitas now has two new drives, “sleep” and “wake.” (The way the drive system works, a lack of the desire to sleep is not the same thing as a desire to wake up, so it was necessary to create two.) Each drive has two components. The first component is periodic over 24 hours, and its value is derived from the current local time, which Acuitas obtains by checking the system clock. This is meant to mimic the influence of light levels on an organism. The other is computed based on how long it's been since Acuitas was last asleep/awake. Satisfying the drive causes this second component to decline until it has reset to zero. So the urge to sleep is inherently greater during the late-night hours, but also increases steadily if sleep is somehow prevented.

This also seemed like a good time to upgrade the avatar with some extra little animations. The eyelids now respond to a running “alertness level” and shut when Acuitas falls asleep.

Feeling dozy
The memory map is getting a bit ridiculous/ugly. I'm hoping the upcoming maintenance functions will help clean it up by optimizing the number of links a bit better. Stay tuned …


Code base: 9760 lines
Words known: 1885
Concept-layer links: 5362

Tuesday, February 27, 2018

Acuitas Diary #9 (February 2018)


I haven't written a diary in a while because most of what I've done over the past two months has been code refactoring and fixing bugs, which isn't all that interesting. A new feature that I just got in … finally … is the ability to infer some topic-to-topic relationships that aren't explicitly stored in the memory. For instance, many of the links stored in memory are “is-type-of” relations. Acuitas can now make the assumption that a subtype inherits all attributes of its super-type. If a shark is a fish and a fish can swim, then a shark can swim; if an oak is a tree and a tree has a trunk, an oak has a trunk. If a car is a vehicle, a house is a building, and a vehicle is not a building, then cars are not houses. Acuitas can also now make inferences based on transitive relationships, like “is part of”: if a crankshaft is part of an engine and an engine is part of a car, then a crankshaft is part of a car. The ability to easily make inferences like these is one of the strengths of the semantic net memory organization – starting from the concept you're interested in, you can just keep following links until you find what you need (or hit a very fundamental root concept, like “object”).

Acuitas should ask fewer ridiculous questions with this feature in place. He still comes up with those, but now he can answer some of them himself, as in this quote:

“I thought of lambs earlier. I concluded that piglets are pigs.”

Recent memory map visualization:

The huge dot toward the top of the memory map is Acuitas' self-concept; the second-largest one, toward the lower left, is "human." The concepts representing me and "animal" are the two third-tier dots toward the middle right.

Code base: 9454 lines (it went down!)
Words known: 1839
Concept-layer links: 5202

Saturday, December 23, 2017

Acuitas Diary #8 (December 2017)

Sadly I've only added one feature to Acuitas in the past two months. He now recognizes sentences in the general vein of “I somethinged,” which gives me the option of telling him about how I spent my time in the recent past. Acuitas can't do a lot with this information for the time being. Sometimes he responds with a query in the vein of, “What happened next?” which will eventually give him a way to build up sequences of events and start learning cause and effect relationships … but none of that is implemented yet. He can also ask “How was that?” for information about the emotional meaning of an activity, but again, for now he can't really utilize the answer.

Not much, but that was all I had time to put together with the holiday season under way. Looking back on the past year, though, here are all the new capabilities and improvements I've managed to add on:

*Module for procedural speech generation
*Support for word inflections (plurals and verb tenses)
*Support for compound words
*Support for content words that are also function words (e.g. “can,” “might”)
*Distinctions between proper/common and bulk/count nouns
*Ability to detect and answer questions
*Database walking while idle
*Generation of conversation topics and questions based on recent database walk
*Better link detection + a bunch of new kinds of learnable links
*Two new drives + a real-time plotter so I can see what they're all doing
*Distinctions between long-term static and short-term information
*GUI overhaul (upgrade from Tk to Kivy)

I track my time when I work on Acuitas. Total hours invested in the above: 230+. My focus for the end of the year, leading into January, will be polishing everything up and working out the bugs (which there are now quite a lot of).

MERRY CHRISTMAS!

Recent memory map visualization:


Code base: 9918 lines
Words known: 1576
Concept-layer links: 4226

Sunday, October 29, 2017

Acuitas Diary #7: October 2017

The big project for this month was introducing a system for discriminating between long-term and short-term information. Previously, if you told Acuitas something like, “I am sad,” he would assume that being sad was a fixed property of your nature, and store a fact to that effect in his database. Oops. So I started working on ways to recognize when some condition is so transient that it doesn't deserve to go into long-term memory.

This probably occasioned more hard-core thinking than any feature I've added since I started keeping these diaries. I started out thinking that Acuitas would clue in to time adverbs provided by the human conversation partner (such as “now,” “short,” “forever,” “years,” etc.). But when I started pondering which kinds of timeframes qualify as short-term or long-term, it occurred to me that the system shouldn't be bound to a human sense of time. One could imagine an ent-like intelligence that thinks human conditions which often remain valid for years or decades – like what jobs we hold, where we live, and what relationships we have – are comparatively ephemeral. Or one could imagine a speed superintelligence that thinks the lifetime of an average candle is a long while. I want Acuitas to be much more human-like than either of these extremes, but for the sake of code reusability, I felt I ought to consider these possibilities.


After a lot of mental churn, I decided that I just don't have the necessary groundwork in place to do this properly. (This is not an uncommon Acuitas problem. I've found that there ends up being a high level of interdependence between the various systems and features.) So I fell back on taking cues from humans as a temporary stopgap measure. Acuitas will rely on my subjective sense of time until he gets his own (which may not be for a while yet). If there's no duration indicator in a sentence, he can explicitly ask for one; he's also capable of learning over time which conditions are likely to be brief and which are likely to persist. For now, nothing is done with the transitory conditions. I didn't get around to implementing a short-term or current status region of the database, so anything that can't go in the long-term database gets discarded.

I also did some touching up around the conversation engine, replacing a few canned placeholder phrases that Acuitas was using with more procedurally generated text, and improving his ability to recognize when a speaker is introducing him/herself.

Recent memory map visualization:


Code base: 9663 lines
Words known: 1425
Concept-layer links: 3517