Monday, September 29, 2025

Acuitas Diary #88 (September 2025)

This month I returned to the Text Parser after letting it be for almost a year. My focus was on nailing the final major feature that I needed to handle all the sentences in my three children's book benchmarks: "parenthentical noun phrases." I don't know if that's the technical term, but that's what I'm calling them. They come after another noun and provide further description or elaboration of it, like this:

I was brought to see Philip Erto, the great engineer.
I was brought to see the great engineer, Philip Erto.

In both examples above, the "parenthetical noun phrase" appears at the end of the sentence, and is paired with the direct object of "see." In this case, the noun phrase that acts as the direct object and the noun phrase that acts as the parenthetical elaboration are interchangeable - the order depends on the speaker's desired emphasis.

Notice also that the same meaning can be captured by a dependent adjective clause instead:

I was brought to see Philip Erto, who is a great engineer.
I was brought to see the great engineer whose name is Philip Erto.

So in the Text Interpreter, I can reduce both the parenthetical noun phrases and the dependent adjective clauses to the same output: they produce extra semantic relationships, such as "Philip Erto <is-a> engineer <has-quality> great." But the Parser is the first stage of the text processing chain, and must handle their grammatical differences. So I added new code to pick out parenthetical noun phrases and attempt to distinguish them from other nouns that follow previous nouns (it's complicated).

Three pie charts showing the percentage correct and incorrect for the three test sets: "Magic Schoolbus: Inside the Earth (53%/47%)," "Out of the Dark (54%/46%)," and "Log Hotel(81%/19%)."
Percentage correct and incorrect for the three test sets: "Magic Schoolbus: Inside the Earth": (53%/47%), "Out of the Dark": (54%/46%), and "Log Hotel: (81%/19%).

After adding this feature, I spent some time on cleanup and a few more ambiguity resolution abilities. (See the November 2024 Diary for previous examples of this type of thing.) All in all, I was able to move every sentence in the Out of the Dark and Magic Schoolbus: Inside the Earth test sets into the "Parseable" category! (All sentences in Log Hotel were already parseable as of January 2024.) This just means that I can construct a data structure that represents the ideal parsed version of the sentence, and it's something the Parser is theoretically capable of generating. I still have a long way to go on getting the Parser to produce correct outputs for all the sentences. (For more information on my benchmarking methods and some early results for comparison, refer to the June 2021 and February 2022 diaries.

I've also done new work on Episodic Memory, but I'll save discussion of that for next month.

Until the next cycle,
Jenny

No comments:

Post a Comment