I haven't done a diary in a while
because I've been busy overhauling the text parser. And that's not
very interesting to talk about, because it's just me taking features
the parser already had and implementing them in a different way. I'd
decided that the scheme I was using had some fundamental deficiencies
and should be re-worked in order to become closer to the way I
actually read, and to better enable future expansion. I was hoping
this would go quickly, given the experience I had already gained from
writing the previous text parser ... but no. It turned out to be a
major project that took far longer than the amount of time I'd
allocated for it.
There are a few new enhancements that I
managed to squeeze in along the way, however:
* Some infinitives are now handled
properly. (Sentence example: "A pot is used to cook.")
Before, Acuitas treated them like any other prepositional phrase,
which meant that the word inside got marked as a "noun."
It's now marked as a verb.
* Previously, Acuitas could store links
indicating what actions an entity can do. I added a new link
type for the actions an entity can have done TO it, and the ability
to detect some sentences that express that information. E.g. "An
object can be pushed."
* I've started working on possessives
and family/ownership relationship links. From a parsing perspective,
this is tricky, because English has this lovely "apostrophe s"
thing that can either turn a word into a possessive, or turn it into
a contraction with "is." If a word ends in 's, I have the
parser crack it apart and treat the 's like its own separate word,
which then gets assigned a meaning based on what the rest of the
sentence is doing. Compound words with the 's appended only to the
last one offer even more fun.
This feature is also tricky from
an interpretation and storage perspective. Previously, Acuitas has
only needed to store information consisting of links joining two
concepts. "<A> is_a <B>." "<A>
can_do <B>." But a possessive isn't quite like that. Oh,
I could force it to be that way:
<A> is_parent_of <B>
<C> is_sibling_of <B>
<D> is_pet_of <B>
<E> is_property_of <B>
But then I'd end up with a
proliferation of manually-defined link types for ALL the possible
relationships that can be expressed by a possessive, and I didn't
want to go there.
I also could have opted to break the
information up into two separate links:
"A is B's parent" --> <A>
belongs_to <B> + <A> is_a <parent>
The problem here is that <A>
might, for instance, be a sibling too. So in which capacity does <A>
belong to <B>? We couldn't tell.
So I ended up defining Acuitas' very
first type of three-ended link:
<A> is <C> of <B>
to handle all the possessives.
The feature is still somewhat
incomplete. For now, he can only properly detect and store possessive
links that join proper names. That's because the proper name is how
he detects that something is an instance rather than a class;
he can't yet build memory nodes for abstract instances like
"somebody's cat"; he still thinks that "cat" is
strictly a type of thing.
Everything is buggy as all get out
right now, too ... but the way this summer has gone, it was about the
best I could do.
Part of the semantic memory visualization. The largest dot left-of-center is Acuitas' self-concept. |
Code base: 11706 lines
Words known: 2425 (approx.)
Concept-layer links: 6754
No comments:
Post a Comment