I spent the past two months revisiting
the text parser, with the big goal this time around of adding support
for dependent clauses. In case anyone's high school grammar is rusty,
a clause is a subject/verb pair and any words associated with them; a
dependent clause is one that is part of another clause and can't be a
sentence by itself. Previously, Acuitas could handle one subject and
one verb group per sentence, and that was it.
Because my own code comments amuse me ... |
After last year's feverish round of
development, I left the text parser a mess and never wanted to look
at it again. So the first thing I had to do was clean up the
disastrous parts. I ended up giving some of the functions another
serious overhaul, and got some code that is (I think) actually
maintainable and comprehensible. Whew.
Next, the clauses. The fun thing here
is that dependent clauses have a function in the sentence (e.g. a
clause can be the subject or direct object of its parent sentence).
For simplicity, my initial text parser worked on the premise that a
functional group in the sentence could only be a single word, or a
compound word with all members marked as the same part of speech. I
had to put in a bunch of new structure to preserve the information
inside the clauses, while also marking the whole clause as a
functional group … plus, detecting multiple subject/verb pairs and
keeping them all straight.
What does this achieve? Some sentence
types that are very important for reasoning use dependent clauses.
For instance, sentences that discuss subordinate pieces of knowledge:
I know [that a cheetah is an animal].
I told you [that a grape can be eaten].
I fear [that the car broke yesterday].
And sentences that express conditional
information:
[If a berry is green], it is unripe.
[If you eat that berry], you will get
sick.
The gun will fire [if you pull the
trigger].
Not to mention that normal human
speaking/writing is riddled with dependent clauses, so interpreting
them is a must for a conversational AI.
Acuitas can parse sentences like the
ones above now, but doesn't really do anything with them yet. That
will come later and require updates to the high-level conversation
management code.
Code base: 15600 lines
Words known: 2884 (approx.)
Concept-layer links: 7915
No comments:
Post a Comment