My focus this past month was on giving
Acuitas the ability to learn more types of inter-word relationships,
and that meant doing some work in what I call the “Text
Interpreter” … the module downstream from the Text Parser.
The Parser attempts to tag each word in
the input with its part of speech and determine its function within
the input. Basically, it figures out all the information you'd need
to know in order to diagram a sentence. But beyond that there is
some more work to be done to actually extract meaning, and the
Interpreter handles this. Consider some of the possible ways of
expressing the idea that a cat belongs to the category animal:
A cat is an animal.
Cats are animals.
A cat is a type of animal.
One type of animal is a cat.
A cat is among the animals.
By removing the content words and
abstracting away some grammatical information, it's possible to
generalize these into sentence skeletons that describe the legal ways
of saying “X is in category Y” in English:
[A] <subject> <be-verb> [a]
<direct object>
[A] <subject> <be-verb> a
<subcategory word> of <object-of-preposition>
One <subcategory word> of
<object-of-preposition> <be-verb> [a] <direct object>
[A] <subject> <be-verb>
among the <object-of-preposition>
I've nicknamed these syntactic
structures “forms.” The Interpreter's job is to detect forms and
match them to concept-linking relationships. As the previous example
should have shown, a single relationship such as class membership can
be expressed by multiple forms, each of which has numerous possible
variations of word choice, etc.
Up until now, the only links Acuitas
could add to his database were class memberships (<thing> is a
<thing>) and qualities (<thing> is <descriptive
word>), plus their negations – and he only recognized a single
form for each. I overhauled the form detection method, making it
more powerful/general and increasing the ease of adding new forms to
the code. Then I added more forms and support for a number of new
link relationships, including ...
<thing> can do <action>
<thing> is for <action>
<thing> is part of <thing>
<thing> is made of <thing>
<thing> has <thing>
The first two are
particularly important, since they mean he can finally start learning
some generic verbs.
I spent the latter half of the month
upgrading Acuitas' GUI library from Tkinter to Kivy. This was a
somewhat unwelcome distraction from real development work, but it had
to be done. Acuitas is a multi-threaded program, and using multiple
threads with Tkinter is ... not straightforward. As the program grew
more complex, my hacky method of letting all the threads update the
GUI was becoming increasingly unsupportable and causing instability.
Of course Kivy does just about everything differently, so porting all
of the GUI elements I'd developed was a serious chore -- but the new
version looks slick and, most importantly, doesn't crash. All the
drawn graphics have anti-aliasing now, which makes the memory
visualizations look nicer when zoomed out.
Code base: 6361 lines
Words known: 896
Concept-layer links: 1474
No comments:
Post a Comment