The Acuitas project is an abstract symbolic cognitive architecture with no sensorimotor peripherals, which might be described as "disembodied." Here I will argue that there are viable methods of solving the Symbol Grounding Problem in such an architecture, and describe how Acuitas implements them. In this first article, I introduce the Symbol Grounding Problem for anyone who is not already familiar with it.
Part I: A Description of the Symbol Grounding Problem
Mean anything to you? (Screenshot of Myst III:Exile via mystjourney.com, copyright Ubisoft) |
Technically the symbols don't have to be words, though language processing is the context in which the problem is most often illustrated. A "symbol" is anything that points to some "referent," such that an intelligent agent who has learned this symbol's meaning can "pick out" the referent upon observing the symbol. (What does "pick out" imply? I would argue that it certainly means "think of," but also goes so far as "find/recognize in the environment," "act on," etc. The point of thinking about referents is usually to do something about them.) A symbol is also probably part of a "symbol system" which allocates a variety of referents to a collection of complementary symbols. The system includes rules for manipulating the symbols to produce combined or derivative meanings; the grammatical rules for composing sentences are an example. "Grounding" is the process of associating symbols with their referents. [1]
For an example of what working with ungrounded symbols is like, try reading a page of text in a language you do not know. Even if you also have a dictionary in this language, looking up the words won't help, because they're only defined in terms of other unknown words. Trying to use the dictionary will lead you on an endless circular path that never arrives at real meaning.
Two more terms that often come up in connection with the SGP are "semantics" and "syntax." "Semantics" is a formal term for meaning or the study thereof, while "syntax" refers to the structural or manipulative rules that are part of a symbol system. It is notable that syntax does not need semantics; it is based on the forms of the symbols themselves, so manipulations that follow the rules can be carried out without knowledge of the symbols' meanings. However, syntax without semantics is arguably not very useful, as it only serves to transform one string of gibberish into another.
For example, suppose I propound the following statements:
All muips are weetabiners.
Paloporoloo is a muip.
Then you could conclude with certainty that "Paloporoloo is a weetabiner." By logic and the syntactic rules of the English language, this is a correct deduction! But how does it help you? You know neither what Paloporoloo is nor what a weetabiner is. So there's not much you can *do* with the information.
To a human, perhaps the most obvious kind of referent is "something out there in the world" - an object, or a fellow embodied agent, or part of a landscape. We can also refer to properties of these things (color, shape, size, age) and to changes in them: actions that they take or that may be taken upon them. But referents include a variety of intangibles too: physical things that can't be directly touched or pointed to (time, energy); systems, organizations, philosophies, and methods (water cycle, nation, liberalism, science); things inside our own minds (idea, memory, decision, emotion); and abstract standards or states of being (love, justice, freedom, beauty). Symbols can even refer to other symbols (word, glyph, number), or to grammatical or logical structure in a sentence (the, and, that).
A symbol can be utterly arbitrary.[2] Some symbols take on a bit of flavor from their referents - onomatopoeic words, for instance, or ideograms that look like stick figures of the objects they represent. But this is not necessary. Any piece of data you like will do to symbolize anything you like. If you hope to use symbols for communication, then the mapping of symbols to referents must be largely agreed upon by you and your communication partner; this is the only real constraint.
Words are just collections of sounds (or squiggles on a surface), in no particular order, with no rules by which they were chosen. They are in no way *inherently* tied to or derived from the referents to which they point. "A rose by any other name would smell as sweet." And this is why the Symbol Grounding Problem is a Problem. Symbols do not map themselves; you need more than just the symbols in order to connect them with their referents. [3] The fabled True Speech is not ours, and the name of the rose without the rose itself is futile.
As a (presumably) human reader you might still be wondering what the big deal is. Of course we go outside words to learn the meanings of words, but that's easy enough, right? A baby can do it. Babies learn words by hearing them in association with some experience of their referents. With time and repetition, a mental link is formed between the two. But now consider the issue from the perspective of an artificial intelligence that has no robotic body, exists as an abstraction inside a computer tower, and *only* processes words. A number of past and present attempts at AI fit this description. How shall they know what words mean?
If we did presume to place an AI mind in a suitable robotic body and let it "grow up like a baby," the human learning process is not fully understood, and not so simple to replicate as it might seem. Just establishing the needed low-level processing so that sensory experiences can be categorized is a massive undertaking that remains incomplete. Here lies the attraction of jumping to a fully-formed, abstract linguistic intelligence, even if this demands novel ways of grappling with the SGP.
I want to address one more wrinkle before going further. Symbols have an objective meaning (the referent your community or culture generally agrees they map to), but they also have a subjective meaning. For every symbol you know, there is something that it means *to you,* based on its referent's implications in your particular life. [4] There are ways your inner existence changes upon reading certain words. These are not necessarily constant across the whole time you know a symbol, either; they shift with personal growth and situational context. Both the objective and the subjective are important to the use of symbols for communication. Objective meanings allow communication to be successful. They are what give your partner the ability to "pick out" the same referent you just "picked out." But it is the subjective meanings that provide the motive for attempting communication in the first place. If something is utterly unimportant to you, you probably won't bother talking about it. An ideal grounding solution should enable both these conceptions of "meaning."
Simple word-object association grants objective meaning but not subjective meaning. To obtain the latter, you need some notion of the referents doing things *for* you or *to* you. Rewards, goals, nociception, attraction, bliss, agony. Connection of referents, and by extension their symbols, with positive, negative, or neutral states in the self builds up subjective meaning. A baby learning subjective meanings has, for starters, a sensitive body with homeostatic needs for warmth, food, hygiene, and sleep. Most AI programs have nothing remotely like this to work with.
The SGP has implications for other well-known AI problems, such as the Alignment Problem. Suppose someone would like to give an AI a built-in directive such as "love your neighbor" or "do not harm human beings." One of your reactions upon hearing such a proposal should be "How will you tell the AI what 'harm' is? How are they supposed to know how to 'love'? Who is their 'neighbor'? What counts as a 'human being'?" (Even humans are notorious for defining these terms so as to make ethical loopholes for themselves.) The problem gets especially sticky if the plan is to somehow instill these ideals, from the beginning, in an AI that will gradually learn language. How to embed a directive writ in symbols before the symbols are even known? What if the wrong meanings are learned?
And yet ... some AI systems that manipulate symbols *without* any grounding look surprisingly capable. Large Language Models, often criticized for being word blenders that contain no connections between those words and anything meaningful[5], can still produce coherent and responsive texts. Image generators that know nothing about physics or three-dimensional form still turn out stunning pictures. This has led some to question whether we really need Symbol Grounding after all, since programs without it can achieve a lot of behaviors they would associate with "understanding" or "intelligence."
Thus the Symbol Grounding Problem ignites two debates in the AI research community. Do we really need to worry about it? And if so, how can we solve it?
In Part II, I'll look into arguing the first question.
[1] Stevan Harnad (2007) Symbol grounding problem. Scholarpedia, 2(7):2373., revision #73220
[2] "Anything can be a representation of anything by fiat. For example, a pen can be a representation of a boat or a person or upward movement. A broomstrick can be a representation of a hobby horse. The magic of representations happens because one person decides to establish that x is a representation for y, and others agree with this or accept that this representational relation holds. There is nothing in the nature of an object that makes it a representation or not, it is rather the role the object plays in subsequent interaction." Luc Steels (2008) "The symbol grounding problem has been solved, so what's next?"
[3] "Symbolic representations must be grounded bottom-up in nonsymbolic representations ..." Stevan Harnad (1990), "The Symbol Grounding Problem." Physica D: Nonlinear Phenomena, Volume 42, Issues 1-3, Pages 335-346.
[4] "Something is meaningful if it is important in one way or another for survival, maintaining a job, social relations, navigating in the world, etc. For example, the differences in color between different mushrooms may be relevant to me because they help me to distinguish those that are poisonous from those that are not." Steels seems to prefer the term "representation" for what I'm calling "objective meaning." Luc Steels (2008) "The symbol grounding problem has been solved, so what's next?"
[5] "I suspect that Johnson (like many others) has mistaken the ability of GPT-3 and its ilk to manipulate linguistic form with actually acquiring a linguistic system. Languages are symbolic systems, and symbols are pairings of form and meaning (or, per de Saussure, signifier and signified). But GPT-3 in its training was only provided with the form part of this equation and so never had any hope of learning the meaning part." Emily Bender (2022) "On NYT Magazine on AI: Resist the Urge to be Impressed."
But, before going further, is Acuitas really disembodied? Without stretching it too much, we can consider his input/output as sensorimotor peripherals. Of course, it's not enough to learn what an apple is, like children do so easily, but it may be enough to learn what a conversation is, and from there, to learn more about agents/humans who can have a conversation. If conversations *are* the world Acuitas lives in (and after all, apples don't exist in this world), that would be a start.
ReplyDeleteI think the latter part of your comment is on track.
DeleteHowever, I don't consider abstract symbolic information such as text and numerals to be "sensorimotor." In my opinion that dilutes the term too much.
Acuitas, as I see it, receives and emits *ideas.* He does not receive inputs derived from interaction with physical objects, nor does he produce outputs intended to make physical objects move.
If he is embodied, what is his body? Not the computer tower. He is unaware of it and its surrounding environment. Although he technically couldn't exist without it, its form is irrelevant to him as long as he'll run on it.