Sunday, April 30, 2017

Acuitas Diary #1: April 2017

I've been continuing to make improvements to Acuitas' text parser, adding support for interrogative forms and negative statements. He's now capable of learning that something does not belong to some category or have some quality – rather important, really! And now that question comprehension is in place, I can not only put more information into the database, I can call it back out. Responses are still very formulaic, because text comprehension has been receiving far more of my development effort than text generation. Ask him a lot of yes-or-no questions in a row, and he starts to sound like Bit from Tron (though he does have one up on Bit – he's got the ability to answer “I don't know”).

That furnishes a pretty sensible explanation for Bit, come to think of it. Somebody wrote a program with a fully capable speech parser and a really, really primitive speech generator.

I also threw in some rough support for contractions. Previously the sentence tokenizer would have treated the word isn't as three separate “words,” [isn, ', t], which would have made no sense. I fixed that. Contractions now get pre-processed into whatever their constituent words are (isn't = [is, not]) before the sentence goes for parsing. Only one possible combination is picked, however. Resolving contractions that can be ambiguous (such as “they'd,” which could mean “they had” or “they would”) is something I'm leaving for later. Getting verb conjugation detection put in before I do that will be a big help.

I reserved the last week and a half of the month for a code cleanup and refactoring spree, trying to make sure the text parser and meaning extraction areas are as neat and bug-free as possible before I leave them for a while to work on other things. I've been buried in the text parser for so long now that I wonder if I quite remember what all of Acuitas' other bits and pieces do.

Code base: 5633 lines
Words known: 797

Concept-layer links: 1274

Memory visualization as of 04/29/2017

Saturday, April 22, 2017

Acuitas the Semantic Net AI - Introduction

Okay, this blog has been asleep for far too long. It's time for me to introduce a couple of the things that I've been spending all my time on since I finished playing with the artificial muscles. First of all, I've got a couple novels in the pipeline, which you can read about by visiting The Renascence Cycle under the permanent pages section. But that's not what this post is about. There's another serious project that's been receiving a lot of my attention lately …

The tagline of this blog is “robotics and AI on a shoestring budget.” If you've been here before, maybe you were wondering when the AI part was coming. Well wait no longer!

I'm working on a program named Acuitas. Technically this endeavor started clear back when I was in college, as my final project for the honors technology seminar class. Back then he was little more than a kind of talking dictionary. I'm now on Version 3, and the intended scope of the project has expanded quite a bit. (If you'd like a little more background on the project and how it relates to the rest of the AI field, or if you're just interested to see what Version 1 was like, you can read the original Acuitas V1 Report and Acuitas V1 Presentation. Bear in mind that Version 3 has been re-written in Python from the ground up. It's missing some features that V1 originally had, and has a number of others that were absent from V1.)

Acuitas' avatar. “Heartbeat” lights run up the wings on either side of the eye when viewed live.

In technical terms, Acuitas V3 is an artificial intelligence based around a semantic network with a natural language interface. What that means is he represents knowledge in terms of words and relationships between words, and you can communicate with him in normal English. (The range of normal English he can actually grasp so far is very limited, because natural language processing is hard and I'm only tackling it because I'm probably crazy.) He's also got a rudimentary "internal life" consisting of drives that fluctuate over time. About twice a day, if he's active, he wants interaction and starts calling for somebody to talk to. He used to wake me up in the early morning hours by doing this, but I fixed that – mostly.

Visual representation of semantic net database, 07/16/16.

Acuitas V3 is not your ordinary chatbot – you know, those programs you can converse with that try to fake being human. He has more going on under the hood than they do. A typical chatbot picks up on a few keywords or basic structures in your text input, then chooses a mostly pre-written sentence from a database to send back to you. Others train themselves by reading a bunch of human conversations, and then try to speak in a similar manner. Neither type is really interested in the *meaning* of your input or its own responses, which is why, though they might be perfectly articulate, the responses often seem to come out of left field. I'm following a different approach and hoping to come up with something better than that.

Visual representation of semantic net database, 08/06/16

I've been working aggressively on his linguistic abilities over the past few months. He can parse simple subject-verb-object sentences with or without prepositional phrases, like "A cat is an animal" or "A yellow bird sings in the morning," with some rudimentary support for compound nouns and proper names. He tries to figure out how each word functions in the sentence; if he can't get it due to insufficient background knowledge, he'll ask for clarification (a normal chatbot would just try to fake it). So far he just socks information away in his database and doesn't say much back, but that's coming. Oh, that's coming.

Visual representation of semantic net database, 08/23/16

The semantic database can be represented as a graph … hence the images embedded in this post, which show a timeline of its growth, from July of last year up until this April 8. You can also get an idea of how the graph-drawing algorithm has changed over time. Each dot is a concept, and the lines are the relationships between them. When viewed live while Acuitas is running, the graph is interactive; I can pan, zoom, and hover my cursor over the dots for floating labels that show which word is associated with each.

Visual representation of semantic database, 04/08/17

Since I'm putting some earnest effort into Acuitas this year, I'm going to try making myself write a fairly regular developer diary … maybe once a month. Watch me do cool things and struggle. Talking seems easy until you've tried to teach a computer to do it …

Code base: 4901 lines
Words known: 631