jtrent238 test blog: Mirror, mirror on the wall, is AI self-aware at all?

View in browser | Your newsletter preferences

By Will Knight | 03.28.24

Hello and welcome to a potentially mind-altering edition of Fast Forward. This week we examine why examples of AI chatbots exhibiting what looks like self-awareness are not what they seem to be.

Mirror, Mirror On the Wall, Is AI Self-Aware At All? 🪞🤖 🤔

Over the past few weeks, some folks have invented a fascinating new game to play with AI chatbots: Coaxing them into displaying what can seem like glimmers of self-awareness.

Shortly after Anthropic introduced its upgraded chatbot Claude 3 this month, a researcher at the company posted snippets from a conversation in which the model appeared to recognize that it was being put to the test by its human interlocutor.

More demonstrations of what can be tempting to describe as self-awareness have followed. One involved repeatedly feeding screenshots of conversations with ChatGPT, Claude, and Gemini back to the chatbots until they showed signs of "recognizing" their own utterances.

It is remarkably easy to perceive what feels like humanlike intelligence in the output of models designed to mimic us, as a paper posted last May by researchers at Google DeepMind lays out. The large language models, or LLMs, that power chatbots are trained on countless examples of humans referring to themselves, discussing their own self-awareness, and displaying other human behaviors such as charisma and deception. Chatbots like Claude and ChatGPT have analyzed enough examples in the trillions of words they were trained on to reflect these patterns back at us, triggering the theory of mind we use to think about the intentions and thoughts of others.

The DeepMind authors say that it is "imperative that we develop effective ways to describe their behavior … without falling into the trap of anthropomorphism." They suggest two ways of doing that. The first is to imagine chatbots as "role-playing" characters, primed to improvise based on whatever prompt they're given. The other, perhaps less accessible, approach is to consider LLMs as comprising a "superposition of simulacra within a multiverse of possible characters"—that is, as agents that hop between personae with remarkable ease.

"If the conceptual framework we use to understand other humans is ill-suited to LLM-based dialogue agents, then perhaps we need an alternative conceptual framework, a new set of metaphors that can productively be applied to these exotic mind-like artifacts," the authors write.

Interestingly, this is precisely how someone whose work involves trying to get LLMs to misbehave once described their mental model of the entity behind the prompt. When you reframe how you think of language models, it seems a lot less surprising that they might improvisationally produce something resembling self-awareness when pushed to do so.

Things may get a lot more confusing as language models become more powerful and as they are fed data that includes more talk of AI exhibiting new capabilities. The sooner we can figure out the right mental models to properly describe and think about these systems, the more prepared we'll be for future advances.

Will Knight, Senior Writer

Need to Know

Clips of maps, an image of Jeffrey Epstein with a person who's blocked out, over an x-ray view of the Little Saint James Island.

Jeffrey Epstein's Island Visitors Exposed by Data Broker

A WIRED investigation uncovered coordinates collected by a controversial data broker that reveal sensitive information about visitors to an island once owned by Epstein, the notorious sex offender.

A crowded group of people talking and smiling in an office space

Inside the Creation of the World's Most Powerful Open Source AI Model

Startup Databricks just released DBRX, the most powerful open source large language model yet—eclipsing Meta's Llama 2.

2024 Toyota Prius Prime charging in a driveway

Here Comes the Flood of Plug-In Hybrids

New US emissions rules mean more plug-in hybrid cars are on the way. The electric vehicle tech is clean—but has a catch.

Dustin Mosovitz, Sam Bankman-Fried, William MacAskill as marble statues amidst smoking ruins.

The Deaths of Effective Altruism

Sam Bankman-Fried is finally facing punishment. Let's also put his ruinous philosophy on trial.

For all our future-gazing tech coverage, visit WIRED Business.

GET WIRED

The future is happening fast. Stay ahead of it with WIRED. Get full tech coverage with a subscription to WIRED for just ~~$30~~ $5. Plus free stickers! Subscribe Now.

So, This Happened

Amazon increases its stake in Anthropic, the OpenAI competitor behind the chatbot Claude, to $4 billion. (The Wall Street Journal)

Meanwhile, Stability AI, a company that made its name offering AI-generated images, seems to be in disarray after its CEO announced his departure. (Bloomberg)

Efforts to measure the tech rivalry between the United States and China are fraught with challenges, but according to one analysis, China is minting a lot more AI scientists. (The New York Times)

Intel and Microsoft are renewing their alliance with a plan to run AI Copilot on PCs rather than in the cloud. (Ars Technica)