Tuesday, April 30, 2024

Acuitas Diary #71 (April 2024)

This past month, I started adding proper support for "issue trees," a feature whose absence has pained me as I've worked on the Narrative and Game Engines. Problems and goals seem to naturally exist in a hierarchy: any problem spawns a plan for solving it, which can contain new tasks or subproblems that require their own solutions, and so on until one reaches atomic actions that can be performed without issue. The Narrative understanding code already included some procedures for inferring extra issues from those explicitly stated in a story. But after they were created, no connection was maintained between parent and child issues.

Star Trek still of a Borg cube floating in space. Caption says, "It's been a busy week."

As an example of the type of difficulty Acuitas could get into by not keeping record of issue relationships: sometimes issues get overcome by events. A subgoal is rendered invalid if the parent goal it serves is unexpectedly realized via some other method. Such a situation appeared in the Tron story. Flynn's primary goal is to obtain a metadata file which proves that he, not Dillinger, was the creator of several video games. (In my simplified version of the story, I rendered this as "Flynn wanted his videogames.") He tries to do this by disabling the Master Control Program and hacking into the Grid - a subgoal the MCP thwarts by "digitizing" him. After Flynn helps defeat the MCP through his adventures inside the Grid, the file he wanted simply prints out. So technically Flynn never completes his hacking attempt; he no longer needs to. In my original version of the story, I had to gloss over this by never actually stating that Flynn wanted to, or planned to, hack the Grid. If I did, Acuitas would treat this unfulfilled subgoal as either a loose end or an ultimate win for the villains, denying the story a fully happy ending.

So my work included adding the proper tree relationships, plus some code that would enforce the recursive cascade of issue deactivation when a problem is solved or a goal realized, and testing to be sure this worked correctly in Narrative and didn't break anything in the Game Engine. As part of this I made an inane test story to isolate the kind of scenario I had in mind:

"Michael was a human."
"Michael was hungry."
"Michael decided to eat food."
"But Michael did not have food."
"Michael was in a room."
"A table was also in the room."
"An apple was on the table."
"Michael decided to get the apple."
"But then Sarah walked into the room."
"Sarah had a pizza."
"Sarah gave the pizza to Michael."
"Michael ate the pizza."
"The end."

I first got this story working: once Sarah comes through with the pizza, the subgoal of picking up the apple properly deactivates, and the story is considered finished. Then I revised "Simple Tron" slightly, to be more like the version I first wrote, which registers hacking the Grid as one of Flynn's subgoals. This also completed with a recognized "good ending" and no loose ends.

I have also been pushing hard to get that previously-mentioned knowledge representation refactoring finished. I got to the point of bringing the reformatted semantic database online and moving a lot of changes into the live code - but I did not quite get it finished, so if the Acuitas codebase were a business, it would have "pardon our dust" signs everywhere. He can at least read stories without crashing, and get through a rudimentary "Hello, my name is ..." conversation, but there are a lot of bugs for me to clean up yet. I'm planning to revise the Conversation area soon anyway, though, so maybe it's okay?

Until the next cycle,
Jenny

Saturday, April 13, 2024

AI Ideology IV: Algorithmic Bias

I'm in the midst of a blog series on AI-related ideology and politics. In Part III, I considered some implications and pitfalls of the AI factions and their agendas. This part is about a specific hot-button issue: "algorithmic bias," which has some contentious race-related associations.

An image model's attempt to produce an infographic about AI mistakes, which happens to be mostly full of garbled text and other mistakes. Generated by @zacshaw on Twitter.

In recent years, AI (largely of the Artificial Neural Network variety) has been gradually making inroads into various decision-making roles: assessing job applicants, screening potential homebuyers, detecting fraudulent use of social services, and even helping to diagnose medical patients. Numerous concerns [1][2][3] have been raised that these systems are biased: i.e. they are unfairly rejecting qualified people, or accepting unqualified people, on the basis of characteristics irrelevant to the decision. This is particularly worrying for a couple of reasons. 

First, handing an important decision off to an AI system removes the details of how that decision was made from human supervision. Typical ANN systems are notoriously opaque. In effect, they make decisions by comparing the case under present consideration, to patterns or associations found in their training data. But they are not naturally good at supplying a logical breakdown of how a decision was reached: which features of the present case matched the training material, how they were weighted, and so on. (The "explainable AI" research field is seeking to ameliorate this.) So, say your job application or attempt to access medical treatment gets denied by an algorithm. It's possible that no one knows exactly why you were denied, and no one can be held accountable for the decision, either. The magic box pronounced you unworthy, and that's the end of it. Faulty automated systems (from an earlier era than the current crop of ANN-based tools) have even sent people to prison for non-existent crimes. [4]

Second, some people are inclined by default to trust an AI system's decision more than a human's. It's just a computer doing deterministic calculations, right? It doesn't have emotions, prejudices, ulterior motives, conflicts of interest, or any of the weaknesses that make humans biased, right? So the expectation is that all its decisions will be objective. If this expectation does not hold, members of the public could be blindsided by unfair AI decisions they did not anticipate.

And in fact, some are so convinced of these default assumptions that they insist the whole idea of algorithmic bias must be made up. "Math can't be biased." The algorithms, they say, are just acting on the facts (embodied in the training data). And if the facts say that members of one group are more likely to be qualified than another ... well, maybe a skewed output is actually fair.

Although mathematics and algorithms do, in truth, know nothing of human prejudice, algorithmic bias is quite real. Let's start by looking at an example without any especially controversial aspects. There was a rash of projects aimed at using AI to diagnose COVID-19 through automated analysis of chest X-rays and CT scans. Some of these failed in interesting ways.

"Many unwittingly used a data set that contained chest scans of children who did not have covid as their examples of what non-covid cases looked like. But as a result, the AIs learned to identify kids, not covid.

"Driggs’s group trained its own model using a data set that contained a mix of scans taken when patients were lying down and standing up. Because patients scanned while lying down were more likely to be seriously ill, the AI learned wrongly to predict serious covid risk from a person’s position.

"In yet other cases, some AIs were found to be picking up on the text font that certain hospitals used to label the scans. As a result, fonts from hospitals with more serious caseloads became predictors of covid risk." [5]

All these examples are cases of the AI mistaking a correlation (which happens to exist only in its limited training dataset) for a causative factor. Unlike experienced doctors - who know full well that things like label fonts have nothing to do with causing disease, and are thus chance associations at best - these ANI systems have no background knowledge about the world. They have no clue about the mechanisms that produced the data they're being trained upon. They're just matching patterns, and one pattern is as good as another.

Now imagine that an AI grabs onto a correlation with race or gender, instead of poses or fonts. That doesn't make the person's race or gender meaningful to the question being answered - not any more than label fonts are meaningful to an accurate determination of illness. But the AI will still use them as deciding factors.

The COVID-19 diagnosis summary also comments on another type of failure:

"A more subtle problem Driggs highlights is incorporation bias, or bias introduced at the point a data set is labeled. For example, many medical scans were labeled according to whether the radiologists who created them said they showed covid. But that embeds, or incorporates, any biases of that particular doctor into the ground truth of a data set. It would be much better to label a medical scan with the result of a PCR test rather than one doctor’s opinion, says Driggs. But there isn’t always time for statistical niceties in busy hospitals." [6]

If an ANN's training data contains examples of human decisions, and those decisions were prejudiced or otherwise flawed, the AI algorithm (despite having no human weaknesses in itself) will automatically inherit the bad behavior. It has no way to judge those prior choices as bad or good, no concept of things it should or shouldn't learn. So rather than achieving an idealized objectivity, it will mimic the previous status quo ... with less accountability, as already noted.

An instance of the Anakin/Padme meme. Anakin says "We're using AI instead of biased humans." Padme, looking cheerful, says "What did you train the AI on?" Anakin says nothing and gives her a deadpan look. Padme, now looking concerned, says again "What did you train the AI on?"

So. Training an AI for criminal sentencing? It's only going to be as objective as the judges whose rulings you put in the training set. Training it for job screening using a set of past resumes, hiring decisions, and performance ratings? It's going to mimic those previous hiring decisions and ratings, whether they fairly assessed who was qualified or not.

As a consequence of this effect, you can get (for example) a racially biased AI model without the end users or anyone on the development team actually being racist. All it takes is racism as a driving factor behind enough scenarios in the training data. And has racism historically been an issue? Of course. So it can be difficult to construct uncontaminated training sets from records of past decisions. Nobody really thinks an AI model can be racist in the same manner as a racist person ... but that doesn't mean it can't output decisions that treat people differently on the basis of irrelevant genetic or cultural attributes. As Gary Marcus says, "LLMs are, as I have been trying to tell you, too stupid to understand concepts like people and race; their fealty to superficial statistics drives this horrific stereotyping." [7]

Unfortunately, my current impression of efforts to fix algorithmic bias is that they aren't always addressing the real problem. Cleansing large datasets of preexisting biases or irrelevant features, and collecting more diverse data to swamp out localized correlations, is hard. Pursuing new AI architectures that are more particular about how and what they learn would be harder. Instead, a common approach is to apply some kind of correction to the output of the trained model. When Google's image labeling AI misidentified some Black people in photos as "gorillas," Google "fixed" it by not allowing it to identify anything as a gorilla. [8][9] Known biases in a model's training set can be mitigated by applying an opposite bias to the model's output. But such techniques could make matters even worse if executed poorly. [10]

OpenAI's approach with ChatGPT was to use RLHF (Reinforcement Learning with Human Feedback) to create another layer of training that filters offensive or potentially dangerous material from the output of the base model. Human workers assigned the RLHF layer "rewards" for "good" outputs or "punishments" for "bad" ones - at the cost of their own mental health, since they were charged with looking at horrific content in order to label it. [11] Clever users have still found ways to defeat the RLHF and finagle forbidden content out of the model. AI enthusiasts sometimes use a shoggoth to represent the incomprehensible "thinking" of large language models. The mask is the RLHF. [12]

"Shoggoth Meme Explainer," showing the headline of the referenced New York Times article, above a pair of cartoon shoggoths. One is labeled GPT-3. Commentary text says "The body: 'AIs are alien minds' (we 'grow them' but don't know what they're really thinking). The other shoggoth, which has a yellow smiley face mask strapped on a part that might be viewed as the head, is labeled GPT-3 + RLHF. Commentary text says "The mask: early versions were horrifying, so we trained them to *act* nice and human-like. *Act.*"

Algorithmic bias, then, remains a known, but incompletely addressed, issue with the ANN/ML systems popular today.

In Part V of this series, I will start my examination of existential risks from AI.

[1] Giorno, Taylor. "Fed watchdog warns AI, machine learning may perpetuate bias in lending." The Hill. https://thehill.com/business/housing/4103358-fed-watchdog-warns-ai-machine-learning-may-perpetuate-bias-in-lending/

[2] Levi, Ryan. "AI in medicine needs to be carefully deployed to counter bias – and not entrench it." NPR. https://www.npr.org/sections/health-shots/2023/06/06/1180314219/artificial-intelligence-racial-bias-health-care

[3] Gilman, Michele. "States Increasingly Turn to Machine Learning and Algorithms to Detect Fraud." U.S. News & World Report. https://www.usnews.com/news/best-states/articles/2020-02-14/ai-algorithms-intended-to-detect-welfare-fraud-often-punish-the-poor-instead

[4] Brodkin, Jon. "Fujitsu is sorry that its software helped send innocent people to prison." Ars Technica. https://arstechnica.com/tech-policy/2024/01/fujitsu-apologizes-for-software-bugs-that-fueled-wrongful-convictions-in-uk/

[5] Heaven, Will Douglas. "Hundreds of AI tools have been built to catch covid. None of them helped." MIT Technology Review. https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis-pandemic/

[6] Heaven, "Hundreds of AI tools have been built to catch covid."

[7] Marcus, Gary. "Covert racism in LLMs." Marcus on AI (blog). https://garymarcus.substack.com/p/covert-racism-in-llms

[8] Vincent, James. "Google ‘fixed’ its racist algorithm by removing gorillas from its image-labeling tech." The Verge. https://www.theverge.com/2018/1/12/16882408/google-racist-gorillas-photo-recognition-algorithm-ai

[9] Rios, Desiree. "Google’s Photo App Still Can’t Find Gorillas. And Neither Can Apple’s." The New York Times. https://www.nytimes.com/2023/05/22/technology/ai-photo-labels-google-apple.html#:~:text=The%20Nest%20camera%2C%20which%20used,company's%20forums%20about%20other%20flaws

[10] Wachter, Sandra, Mittelstadt, Brent, and Russell, Chris. "Health Care Bias Is Dangerous. But So Are ‘Fairness’ Algorithms" Wired. https://www.wired.com/story/bias-statistics-artificial-intelligence-healthcare/

[11] Kantrowitz, Alex. "He Helped Train ChatGPT. It Traumatized Him." CMSWire. https://www.cmswire.com/digital-experience/he-helped-train-chatgpt-it-traumatized-him/

[12] Roose, Kevin. "Why an Octopus-like Creature Has Come to Symbolize the State of A.I." The New York Times. https://www.nytimes.com/2023/05/30/technology/shoggoth-meme-ai.html