I've continued splitting my development time for the month between the Narrative module and something else. This month the "something else" was the Text Parser. On the Narrative front, I am still working on the "big" story, and at this point I can't think of any major new features to talk about; it's been mainly a matter of adding sentences to the story, then making sure the needed words and/or facts are in the database and all the bugs are wrung out so the narrative understanding "works." I'm eager to reveal the final product, but it'll be a while yet!
The Parser goal for this month was adding basic support for gerunds and participles. The common factor between these is that they're both verb phrases used as some other part of speech. So detecting them takes extra effort because they must be distinguished from verbs that are actually functioning as verbs. In case you're not familiar with these grammar minutiae, here are some examples:
Singing is one of my pleasures. (Gerund as subject)
I don't enjoy eating bacon. (Gerund as direct object)
Are you sure of winning? (Gerund as object of preposition)
The dog, eating busily, resisted my efforts to pull the dish away. (Participle modifying subject)
The winning team came back onto the field. (Participle modifying subject)
I met a man named Bill. (Participle modifying direct object)
Helping verbs often accompany these forms when they are truly acting as verbs, and their absence is one clue to the possibility of a gerund or participle. Sometimes punctuation also provides a hint. Otherwise, gerunds and participles must be identified by their relationship (positional, and perhaps also semantic) to other words in the sentence.
After adding support for the new phrase types, I re-ran the Text Parser benchmarks. I also added a new test set, consisting of sentences from Log Hotel by Anne Schreiber. This children's book has simpler sentences than the other examples from which I derived test materials, while still not leaning too hard on the illustrations to convey its message.
I'm pleased with the results, even though progress may still seem slow. Both original test sets (The Magic Schoolbus: Inside the Earth and Out of the Dark) now show roughly 75% of sentences parseable (i.e. the Parser supports all grammatical constructs needed to construct a correct golden parse for the sentence), and 50% or more parsing correctly. Log Hotel has an even higher parseable rate, but a lower correct rate. Despite the "easy" reading level, it still does complex things with conjunctions and presents a variety of ambiguity problems (most of which I haven't even started trying to address yet).
Pie charts of parser success on sentences from the three text examples I currently have. |
To address the remaining unparseable sentences, I've got adjective clauses, noun-phrases-used-as-adverbs, and parenthetical noun phrases on my list. A full-featured Text Parser is beginning to feel close.
Until the next cycle,
Jenny
No comments:
Post a Comment