Tuesday, September 4, 2018

Acuitas Diary #13: August 2018


I haven't done a diary in a while because I've been busy overhauling the text parser. And that's not very interesting to talk about, because it's just me taking features the parser already had and implementing them in a different way. I'd decided that the scheme I was using had some fundamental deficiencies and should be re-worked in order to become closer to the way I actually read, and to better enable future expansion. I was hoping this would go quickly, given the experience I had already gained from writing the previous text parser ... but no. It turned out to be a major project that took far longer than the amount of time I'd allocated for it.

There are a few new enhancements that I managed to squeeze in along the way, however:

* Some infinitives are now handled properly. (Sentence example: "A pot is used to cook.") Before, Acuitas treated them like any other prepositional phrase, which meant that the word inside got marked as a "noun." It's now marked as a verb.

* Previously, Acuitas could store links indicating what actions an entity can do. I added a new link type for the actions an entity can have done TO it, and the ability to detect some sentences that express that information. E.g. "An object can be pushed."

* I've started working on possessives and family/ownership relationship links. From a parsing perspective, this is tricky, because English has this lovely "apostrophe s" thing that can either turn a word into a possessive, or turn it into a contraction with "is." If a word ends in 's, I have the parser crack it apart and treat the 's like its own separate word, which then gets assigned a meaning based on what the rest of the sentence is doing. Compound words with the 's appended only to the last one offer even more fun.

This feature is also tricky from an interpretation and storage perspective. Previously, Acuitas has only needed to store information consisting of links joining two concepts. "<A> is_a <B>." "<A> can_do <B>." But a possessive isn't quite like that. Oh, I could force it to be that way:

<A> is_parent_of <B>
<C> is_sibling_of <B>
<D> is_pet_of <B>
<E> is_property_of <B>

But then I'd end up with a proliferation of manually-defined link types for ALL the possible relationships that can be expressed by a possessive, and I didn't want to go there.

I also could have opted to break the information up into two separate links:

"A is B's parent" --> <A> belongs_to <B> + <A> is_a <parent>

The problem here is that <A> might, for instance, be a sibling too. So in which capacity does <A> belong to <B>? We couldn't tell.

So I ended up defining Acuitas' very first type of three-ended link:

<A> is <C> of <B>

to handle all the possessives.

The feature is still somewhat incomplete. For now, he can only properly detect and store possessive links that join proper names. That's because the proper name is how he detects that something is an instance rather than a class; he can't yet build memory nodes for abstract instances like "somebody's cat"; he still thinks that "cat" is strictly a type of thing.

Everything is buggy as all get out right now, too ... but the way this summer has gone, it was about the best I could do.

Part of the semantic memory visualization. The largest dot left-of-center is Acuitas' self-concept.

Code base: 11706 lines
Words known: 2425 (approx.)
Concept-layer links: 6754

No comments:

Post a Comment