My first objective for the new year was enabling the Text Parser to handle lists or conjunction groups with more than two items. For quite a while now, Acuitas' parser has been equipped to handle sentences like this:
Jack and Jill went up the hill.
But a sentence like *this* would hopelessly confuse it:
Jack, Jill, and John went up the hill.
I started out by only handling pairs because that makes it simpler to discern which parts belong in a list/group and which don't; you only have to look at the sentence elements that bracket the conjunction. I figured I would expand to longer lists later. But once I got here - well, I ended up using some previous work, but I decided on a pretty extensive overhaul. I'll try to explain what my options were and why I chose to change course.
One way of dealing with a list is to encapsulate it. For example, most sentences have a subject, the thing that's doing the action, and part of the Parser's job is to determine which word is the subject and tag it; "(subj, Jack)->(verb, went)." If you have a list of subjects (as in "Jack, Jill, and John went up the hill"), you can bundle them into a compound subject and tag that. So the parsed sentence becomes something like "(subj, <list>)->(verb, went)," and you can open up <list> and see that it contains Jack, Jill and John. I was already handling sentences with dependent clauses this way (e.g. "What you need is a blanket" becomes "(subj, <depcl>) is a blanket").
Another possibility is to imagine the sentence structure like a railway line. Subject connects to verb connects to direct object and indirect object, and if some of those are multiple, the line will branch or merge. Our previous example would look something like this:
(subj, Jack) -
\
(subj, Jill) --->(verb, went)
/
(subj, John)-
I had previously been using the "encapsulation" method for a few things (like lists of adjectives), but I used the "line" method for the main sentence structure, because I thought I needed it to handle some of the more complex cases. Lists of single words are the easy ones. You can also have lists of verb-object groups:
I threw out the soup, ate the pizza, and saved the cake.
You can have lists of verbs in which some attach to the direct object and some don't:
Brent ran and threw the javelin.
Occasionally, you can have lists of subject-verb groups that converge on a single object:
Are you or are you not a teacher?
I had concluded that parsing the sentence into a branching type of structure was the only way to deal with groups that spanned words with different roles (because otherwise, how would I assign the list a single role in the full sentence?). But there are also distinct disadvantages to not treating the members of a list as a unit, and once I got into lists longer than two, those began to feel overwhelming. So I opted to switch everything over to the "encapsulation" method.
How *did* I handle groups containing multiple roles, then? I realized I could decree that the role of the list in the main sentence would be "verb." This works because a verb is really the one thing that every sentence needs. Some sentences only have an implied subject, and objects are always optional. So lists of subj-verb groups, lists of verb-obj or mixed verb and verb-obj groups, and even lists of subj-verb-obj groups, can all become "verbs" at the top level of the hierarchy, and only unpacking them need reveal their deeper structure.
Aside from this conversion, there was a fair bit of new development work I did to detect lists and figure out where their boundaries are. There are plenty of (not) fun ambiguities involved, like this one:
For dinner, Sue and James brought a pot pie.
A clumsy parser might assume that "dinner, Sue and James" is a list that forms the object of the preposition "for," then be left wondering where the subject of the sentence is.
I haven't recovered the full functionality of the former Parser where pairs of groups were concerned (I'll pick at that gradually while moving on to other topics), but that's balanced by the capacity to handle longer lists in quite a few scenarios. This was one of the last major missing features of the Parser, and a heavy weight on my mind. So it feels wonderful to finally have this capability in place.
Until the next cycle,
Jenny

No comments:
Post a Comment