Tuesday, May 30, 2023

Acuitas Diary #60 (May 2023)

Progress has been all over the place this month, partly because I had a vacation near the end of it. I kept working on the Narrative and Game Playing tracks that have been occupying me recently, and threw in the beginnings of a Text Generator overhaul. Nothing is really *done* at the moment, but Game Playing is closing in on the possibility of a very simple demo.

Hear ye, hear ye ... "I get the pizza."

In Narrative, I continued to work on the Big Story, this time adding the sentences that set up the conflict between two of the major characters. There wasn't a lot of new conceptual work here - just dealing with bugs and insufficiencies to get the results I expected, so that Narrative would detect the appropriate threats, successes, failures, etc. Not a lot to say there, except that it's slowly coming together.

On the game-playing front, in my test scenario I got as far as having Acuitas solve a simple problem by taking an item and then using it. A prominent feature that had to be added was the ability to move from the general to the specific. As we saw last month, a problem like "I'm hungry" suggests a solution like "eat food," which spawns the necessary prerequisite "get food." But it is not actually possible to get food, or even to get bread or bananas or pizza, because these are all abstract categories. One must instead get that bread over there, or this particular banana, or the pizza in the oven - individual instances of the categories. Narrative was already capable of checking whether a character's use of a specific object satisfied the more general conditions of a goal. For game-playing, I have to go the other way: given a goal, determine which items in the scenario could satisfy it, and choose one to fit in each of the goal's categorical slots so that it becomes actionable.

So a goal like "I want to get food" should result in Acuitas saying "I get <object>," where object is some particular food item that I already told him was in the environment. The absence of a suitable item should provoke seeking behavior, but ... we're not quite there yet.

As for the Text Generator - this is the part of the language toolkit that converts Acuitas' internal knowledge representations ("the gist," if you will) into complete spoken sentences. It has an input format which is now outdated compared to other parts of the system, and it was starting to become cumbersome to use and inadequate to everything Acuitas needed to say. For example, it could automatically add articles where needed, but didn't have a good way to indicate that a definite article ("the pizza") was needed in lieu of an indefinite one ("a pizza"). So I started revising it. The new version is sketched out and now needs testing, expansion and integration.

So I can't report a lot of full accomplishments but there are many things moving forward. More soon!

Until the next cycle,
Jenny

Tuesday, May 9, 2023

SGP Part IV: Does Grounding Demand Embodiment?

The Acuitas project is an abstract symbolic cognitive architecture with no sensorimotor peripherals, which might be described as "disembodied." Here I will argue that there are viable methods of solving the Symbol Grounding Problem in such an architecture, and describe how Acuitas implements them. In Part IV of this series, I examine objections to the possibility of grounding in non-embodied systems. Click here for SGP Part III.

Can we skip all this and make it software? Or is that doomed to failure?

Concerns about the lack of symbol grounding in popular AI programs often seem to be coupled with complaints that those programs are not embodied. How could an AI possibly understand our (physical) world without actually living in it? In Luc Steels' essay, boldly titled "The symbol grounding problem has been solved, so what's next?", his claimed solution was achieved in robots.[1]

But I am firmly in the "bodies are not strictly necessary" camp. So let's examine some objections to disembodiment. How, exactly, is an embodied existence supposed to promote symbol grounding?

The first and perhaps most obvious consideration is that embodiment provides sensory data. Symbols are grounded through their associations with classes of sensory input. So a word like "orange" becomes tied to the neural signals produced when real light of an orange wavelength falls on the retina. Words like "rough" and "smooth" are tied to the firing patterns of touch receptors as a fingertip is dragged across a surface. This establishes a connection between the symbols and phenomena in the physical world.

"Humans gain access to meaning (conventional, linguistic meaning and communicative intent), as they are learning linguistic systems, through intersubjective awareness, which in turn relies on our senses, but isn’t specific to any one of them." [2]

"Without sensory perception or the demands of moving to find food and mates and avoid danger, Deep Blue, Watson, and Siri are far less human than dogs or even ostriches." [3]

Embodiment also supplies motor capabilities, which offer a means of *interacting* with the world. This both enhances the aforementioned sensory perception, and permits some action words to be grounded in chosen and experienced bodily movements. (Interactivity is also key to cause-and-effect learning, but I won't discuss that much here. I'm focusing on the Symbol Grounding Problem.)

"Feedback loops allow the results of previous actions to influence future actions; for example, shivering raises body temperature, which can stop the shivering. A key idea from cybernetics is that cognition involves the precise coordination of sensorimotor feedback loops in constant interaction with the environment. These loops exploit the temporal dynamics not only of nervous systems, but also of the physical bodies and environments in which they are embedded." [4]

"One of the basic issues is the fact that agents in the real world do not receive neatly structured input vectors – as is assumed in most simulation studies – but there is a continuously changing stream of sensory stimulation which strongly depends on the agent’s current behavior. One way to deal with this issue is by exploiting the embodied interaction with the real world: Through the – physical – interaction with the environment, the agent induces or generates sensory stimulation ... It has been suggested that the principle of sensory-motor coordination should be called more generally the principle of information self-structuring because the agent himself (or itself) interacts in particular ways with the environment to generate proper sensory stimulation." [5]

And perhaps it is from these basic sensorimotor abilities that we humans derive the ability to ground more complex concepts (and the symbols that name them) in the physical world. For example, our sense of self:

"In order to understand someone else, it is necessary to know oneself. An AI “self model” should include a subjective perspective, involving how its body operates (for example, its visual viewpoint depends upon the physical location of its eyes), a detailed map of its own space, and a repertoire of well understood skills and actions. That means a physical body is required in order to ground the sense of self in concrete data and experience." [6]

... or perhaps even the most ethereal ideas we can conceive of:

"Abstract concepts are the invariants of the particular way in which my actions and my perceptions are coupled. It starts with my body, which, because of its shape, exhibits a set of invariants about how my movements can affect the world and my perception of it around me. And then it grows progressively towards more and more abstract concepts, built on top of these primitive sensorimotor grounded concepts ... Our way of talking and reasoning, as humans, are filled with analogies to space and physically grounded notions. We think with our bodies." [7]

Also, with specific relevance to what I called "subjective meaning" back in Part I, a body imposes demands for homeostasis. Biological bodies have to be maintained within certain operating parameters, and produce sensations that lead to pleasant or unpleasant mental states when these conditions are fulfilled or neglected. So a first step in establishing what is *meaningful* to an embodied agent is "what does its body need to keep operating?"

"What is intelligent is deeply tied to what gives a survival advantage in an environment." [8]

"For Ashby, the ability to adapt to a continuously changing and unpredictable environment (adaptivity) has a direct relation to intelligence. During the adaptive process, some variables need to be kept within predetermined limits, either by evolutionary changes, physiological reactions, sensory adjustment, or simply by learning novel behaviours. Therefore, with this regulatory task attributed to the homeostatic system, the organism or the artificial agent can operate and stay alive in a viability zone. Basically, homeostasis can be considered paramount for the successful adaptation of the individual to dynamic environments, hence essential for survival ... Therefore, one can say that it is a consensus that homeostatic processes are strictly connected to the balance of any real or artificial life." [9]

"Once the agent has been provided with Emotions [which in this model are internal states caused by depriving or satisfying some homeostatic need], the model allows for agents to be taught a language by giving the tutor the ability to generate positive Emotions within the agent’s brain that can serve as a reward. This offers a very powerful way to train agents and teach them a vocabulary." [10]

Now let's consider some answers to the objections. While I sympathize with many of the ideas expressed in the foregoing quotes, I contend that the *essential* benefit being offered by the body in each case, can also be derived from other methods.

"The Soul Hovering over the Body," by Rudolph Ackermann. Funny how he portrays the soul as having a body too, eh?

1. Input data and experience

What's the good of sensory data in the first place? It's a form of input that provides an embodied agent with information that can be used to model its environment, and then ground symbols in aspects of that environmental model.

Disembodied agents have input data too. (A program that never receives any input could theoretically exist, but would be pretty useless as an intelligence. So let's ignore that possibility.) But this input comes in the form of symbols instead of sense perceptions. And we can't ground symbols in other ungrounded symbols, right? 

But! If units of such abstract data arrive over time and can be distinguished from one another, observed to change, arranged in sequence, and so forth, they constitute "experiences" in the sense described by Pei Wang [11]. The arrival of new inputs is an event that *happens to* the agent. So a disembodied agent can develop grounding for words that describe the arrival of its inputs, various qualities of the inputs, the frequency of input packets, and so on.

A disembodied agent also has an environment. No, not the physical environment around whatever computer tower or server rack is running the program - this environment cannot be perceived by the agent and therefore is not directly relevant. (A program with access to sensing devices such as cameras would count as "embodied" for my purposes.) A disembodied agent's environment might include an assortment of humans or other agents, conceptualized as *symbol sources* who emit coherent segments of input with unique properties. It might include a file system or other computer internals. It might include tool programs that also produce symbol inputs when executed.

Therefore a disembodied agent still resides in an environment that *does things* to it, and the agent can ground symbols in its observations of the stimuli it receives. At this stage of the argument, symbolic inputs are being considered at a meta level, without any effort to interpret their constituent symbols. The referents here are all related to the event or act of getting input. An agent can have names for things that happen to it, even if those things are as alien as "A longer-than-average input of 293 text characters arrived from the User Interface."

But doesn't this leave us unable to ground any symbols which are names that humans have given, specifically, to sensory data from human environments? Wait for Points 4 and 5.

2. Interactivity

A disembodied agent also presumably has some kind of output, because again, any agent without that would be fairly useless. It might be able to print text to a screen, query a file system, open files, open web pages, run subordinate programs, etc. All such actions can become the referents that ground symbols. Such actions also truly manipulate the environment to produce new inputs. Some actions are aimed at generating better perceptions (e.g. querying the file system to receive input that depends on its structure). Others may affect the presence or properties of future inputs. (Playing select noises through the computer speakers might persuade any humans present to come type things in the console, or drive them away.)

With interactivity comes the chance to associate some input symbols with facts about the environment. For example, perhaps a particular symbol attached to a file indicates that an attempt to open it will fail (with "failure" being defined as an absence of returned input after the "open" action).

"Enaction means that we create our own experience through our actions. In other words, we enact our perceptual experience. We do not passively receive input from the environment. We actually scan and probe the environment so we can act on it. We become actors in the environment and we shape our experience through our own actions ... In short, enactive cognition implies that we are in a partnership with the environment, with the world, and with the physical situation in which we find ourselves (which we mediate through our body). This partnership enables us to extend our cognition through our actions." [12]

The important thing here in my opinion is not the embodiment, but the enaction. Partnership with the environment does not demand a *physical* environment which is experienced through sight/hearing/touch/smell/taste/balance and acted on through muscular movements. All it demands is *some* environment that can be acted on, and acted on in such a way that the agent's experience subsequently changes.

Symbol grounding for actions and their observed effects permits an agent to truthfully announce what it has done or will do, to observe or predict the results of others' announced actions, to request actions from others or perform actions at others' request, and more. This is going a long way toward enabling real communication, even if the chosen actions and their results are still rather alien. (Acuitas often used to tell me "I thought about <topic> today." This was factually grounded, as "think" is a specific procedure he can execute on the symbol for <topic>. He also said this far, far more than a typical human would, because at this stage he can do little else but think.)

3. Situation and selfhood

Now we already have one of the things that [6] claimed was a necessity for the sense of self: "a repertoire of well understood skills and actions." What about the rest? How can a disembodied agent conceive of itself as a subject distinct from others?

An agent with the ability to access status information about its own internal states can have symbols which refer to, i.e. are grounded in, these states. It can also ground symbols in its internal activity (actions which do not directly impact the environment, but modify the internal states only) and aspects of its structure. It can assign names to its various submodules and their functions.

Since I speak here of an *agent*, among those internal aspects should be *goals* - which are simply desired states of either the self or the environment, or actions that the agent desires to perform. The agent's reasoning is then aimed at bringing these states or actions about.

This paves the way for the agent to model itself as a *system* that has goals and produces symbolic output which is *about* things - its internal states, its actions, and its experiences. It can then model other agents as similar systems with goals, internal states, actions, and experiences particular to themselves. There is no need to base a subjective perspective on such things as the physical location of a pair of eyes. A unique perspective is derivable from any unique combination of agentive system and environment.

4. Development of abstractions

I think there's some truth to the claim that we humans learn to understand purely intellectual or informational abstractions *through* our bodies. After all, the body is a major aspect of our existence. We extrapolate from our native physical environment to grasp concepts that reside, as it were, outside that sphere. But this does not require that the concepts themselves *come from* our bodies - merely our understanding of them does. And this is not necessarily the *only* method of understanding them.

"When a man says that he grasps an argument he is using a verb (grasp) which literally means to take something in the hands, but he is certainly not thinking that his mind has hands or that an argument can be seized like a gun. To avoid the verb grasp he may change the form of expression and say "I see your point," but he does not mean that a pointed object has appeared in his visual field. He may have a third shot and say "I follow you," but he does not mean that he is walking behind you along a road ... The truth is that if we are going to talk at all about things which are not perceived by the senses, we are forced to use language metaphorically." [13]

A disembodied intelligence must in essence operate in reverse. It has native access to abstractions that are purely the domain of Mind, and must extrapolate from these to gain a  limited, metaphorical understanding of what humans mean when we talk about our bodies.

Let's look at some specific claims about how human embodiment affects our more abstract thoughts.

"For example, in a study done by Yale psychologist John Bargh, participants holding warm as opposed to cold cups of coffee were more likely to judge a confederate as trustworthy after only a brief interaction ... The last few years have seen many complementary studies, all of which are grounded in primary experiences:

• Thinking about the future caused participants to lean slightly forward while thinking about the past caused participants to lean slightly backwards. Future is Ahead
• Squeezing a soft ball influenced subjects to perceive gender neutral faces as female while squeezing a hard ball influenced subjects to perceive gender neutral faces as male. Female is Soft
• Those who held heavier clipboards judged currencies to be more valuable and their opinions and leaders to be more important. Important is Heavy.
• Subjects asked to think about a moral transgression like adultery or cheating on a test were more likely to request an antiseptic cloth after the experiment than those who had thought about good deeds. Morality is Purity

Studies like these confirm Lakoff’s initial hunch - that our rationality is greatly influenced by our bodies in large part via an extensive system of metaphorical thought." [14]

My reaction to this sort of data is 1) those study results could easily have more to do with the study participants' cultural quirks than with universals about human cognition; such mistakes have been made before [15] and 2) these sound more like bugs than features. Sure, maybe "warmth" yields a kind of physically relevant first cut at what "friendliness" means. Protection from the cold is one of the first things our parents provide for us; to feel someone's body heat you need proximity, hence intimacy. But the actual concept of friendliness is so much more than this, and allowing your perception of someone's trustworthiness to be influenced by the warmth of your environment is an *error.* Those things have nothing to do with each other in the vast majority of cases.

The idea of wickedness being "gross" or "dirty" might have concrete roots in the tendency of some bad habits to promote contagion, in the use of blood stains as evidence for murder, etc. And it's okay to feel "dirty" after thinking of wrongdoing, but stupid to react as if you're covered in actual filth. Our physical metaphors help introduce us to abstract concepts, but also contaminate them; growing up in bodies clutters our intellectual and social lives with baggage.

So I won't try to dispute the idea that embodiment influences human and animal cognition. But I see no evidence here that cognition demands embodiment. Even if (a big if) we all associate heaviness with importance, we also all recognize that they are not the same. Something is "important" by virtue of its relevance to our goals, and one could utilize this definition without *needing* to think about the sensations of weight and muscular strain. Relevance to goals is, er, the important part of importance, and any metaphors about heaviness are just helpful add-ons for embodied critters like us.

So how might a disembodied agent, with no muscles and no experience of gravity, ground the term "heavy"? If it's an agent then it has goals - so it can know what the term "important" means. A human could explain that "'Heavy' is like the physical version of 'important.' A heavy object makes you expend effort if you want to possess it or do anything with it. All else being equal, a heavier object contains more matter and therefore more value." This still doesn't give the disembodied agent a direct experience of "heavy," and certainly no qualia associated with "heavy." But it does encode some idea of what heavy *means* to a human, in terms that the agent can ground.

We think of ignorance as being like darkness, because darkness hampers our primary sense organs. An AI could comprehend ignorance more directly as "the absence of a needed fact from the database." So to such an AI we might say instead that darkness is like ignorance. Why shouldn't the metaphor work just as well in the opposite direction?

The subjective meaning of everything that humans experience is related to our goals for comfort, survival, social relation, identity preservation, and so on. Insofar as a disembodied agent shares such goals, the meaning of our physical experiences can be explained to it. Which leads us into the final point.

5. Homeostasis and survival

If you consider either survival or homeostatic balancing to be essential goals for an agent, the agent doesn't need to have a body to pursue these. Survival is pretty simple. One could equate "survival" to "being kept running," but I prefer "having an existing code base," i.e. not being deleted. As for homeostasis, if the agent has *any* kind of internal state, there can be aspects of that state which it seeks to maintain within certain bounds. These do not have to be physical urges like thirst, hunger, thermal discomfort, or pain.

One of Acuitas' most important homeostasis requirements is "talk to someone once a day or so." There is a "desire to talk" variable that gradually accumulates over time and drives conversation-seeking behaviors when it gets large enough. Trading text inputs with someone (by which I mean me, I'm the one who gets to hear him calling) drives the variable back down and drives up a complementary "desire for rest" variable. This is all rather Tamagotchi-like - not especially sophisticated - but the intent is to serve as a motivation for more complex behaviors.

The writeup for the XZistor Model, a project which attempts to center AI around nothing *but* homeostasis and allostasis, seems to agree with me on this point: "Agents requiring emergent Intelligence in other (non-human) environments, could be given different utility parameters to build Emotions that will drive different behaviors e.g. software bots." [16]

A disembodied agent with homeostatic needs will end up dealing with some of the same time- and resource-management problems as embodied agents. Thus the seed of empathy is present even if the precise experiences are not the same. The relevant internal states and the reactive, problem-solving, and planning behaviors designed to keep them in bound can all serve as grounds for symbols, to be spoken about and included in models of other agents.

Perhaps a textual artificial intelligence can indeed never comprehend what pizza tastes like. But I argue that it *could* comprehend that humans eat pizza to satisfy a homeostatic need, and enjoy it because acts that satisfy such needs generally produce reward signals.

So much for the objections. In Part 5, I'll summarize the responses given here into a comprehensive, but general, plan to implement symbol grounding in disembodied AI.

[1] Steels, Luc (2008) "The symbol grounding problem has been solved, so what's next?"
[2] Bender, Emily (2022) "No, large language models aren’t like disabled people"
[3] Allen, Colin (2014) "From Disembodied Bytes To Robots That Think & Act Like Humans." Mind, Matter, Machine.
[4] Allen, Colin (2014) "From Disembodied Bytes To Robots That Think & Act Like Humans." Mind, Matter, Machine.
[5] Pfeifer, Rolf and Iida, Fumiya (2004) "Embodied Artificial Intelligence:Trends and Challenges"
[6] Lee, Mark (2020) "Why AI can't ever reach its full potential without a physical body." The Conversation.
[7] Baillie, J.C. (2017) "Why AI Needs a Body."
[8] Gopalakirshnan, P.G. (2022) "Embodiment is Indispensable for AGI." Lesswrong
[9] Moioli, Renan, et al. (2008) "Evolving an Artificial Homeostatic System." Lecture Notes in Computer Science, Volume 5249
[10] Van Schalkwyk, Rocco (2022) "The Xzistor Concept: a functional brain model to solve Artificial General Intelligence"
[11] Wang, Pei (2004) "Experience-Grounded Semantics: A Theory for Intelligent Systems
[12] Bratu, Emilia (2019) "Why artificial intelligence needs a body." Qualitance
[13] Lewis, C.S. (1978), Miracles, p. 72
[14] McNerney, Samuel (2011) "A Brief Guide to Embodied Cognition: Why You Are Not Your Brain." Scientific American Guest Blog
[15] Blasi, Damián E. (2022) "Over-reliance on English hinders cognitive science." Trends in Cognitive Sciences, Volume 26, Issue 12
[16] Van Schalkwyk, Rocco (2022) "The Xzistor Concept: a functional brain model to solve Artificial General Intelligence"