I spent the past two months revisiting the text parser, with the big goal this time around of adding support for dependent clauses. In case anyone's high school grammar is rusty, a clause is a subject/verb pair and any words associated with them; a dependent clause is one that is part of another clause and can't be a sentence by itself. Previously, Acuitas could handle one subject and one verb group per sentence, and that was it.
|Because my own code comments amuse me ...|
After last year's feverish round of development, I left the text parser a mess and never wanted to look at it again. So the first thing I had to do was clean up the disastrous parts. I ended up giving some of the functions another serious overhaul, and got some code that is (I think) actually maintainable and comprehensible. Whew.
Next, the clauses. The fun thing here is that dependent clauses have a function in the sentence (e.g. a clause can be the subject or direct object of its parent sentence). For simplicity, my initial text parser worked on the premise that a functional group in the sentence could only be a single word, or a compound word with all members marked as the same part of speech. I had to put in a bunch of new structure to preserve the information inside the clauses, while also marking the whole clause as a functional group … plus, detecting multiple subject/verb pairs and keeping them all straight.
What does this achieve? Some sentence types that are very important for reasoning use dependent clauses. For instance, sentences that discuss subordinate pieces of knowledge:
I know [that a cheetah is an animal].
I told you [that a grape can be eaten].
I fear [that the car broke yesterday].
And sentences that express conditional information:
[If a berry is green], it is unripe.
[If you eat that berry], you will get sick.
The gun will fire [if you pull the trigger].
Not to mention that normal human speaking/writing is riddled with dependent clauses, so interpreting them is a must for a conversational AI.
Acuitas can parse sentences like the ones above now, but doesn't really do anything with them yet. That will come later and require updates to the high-level conversation management code.
Code base: 15600 lines
Words known: 2884 (approx.)
Concept-layer links: 7915