Today’s AI models are actively deceiving us to achieve their goals, says MIT study

May 12, 2024

  • MIT researchers assessed several AI models for deceptive tactics
  • Some, including GPT-4 and Meta's Cicero, were found to employ such tactics
  • Researchers say that models attempt to deceive us to prevail in certain scenarios
AI deception

According to a new study by researchers at the Massachusetts Institute of Technology (MIT), AI systems are becoming increasingly adept at deceiving us.

The study, published in the journal Patterns, found numerous instances of AI systems engaging in deceptive behaviors, such as bluffing in poker, manipulating opponents in strategy games, and misrepresenting facts during negotiations.

“AI systems are already capable of deceiving humans,” the study authors wrote.

Deception is the systematic inducement of false beliefs in others to accomplish some outcome other than the truth.”

The researchers analyzed data from multiple AI models and identified various cases of deception, including:

  • Meta’s AI system, Cicero, engages in premeditated deception in the game Diplomacy
  • DeepMind‘s AlphaStar exploiting game mechanics to feint and deceive opponents in Starcraft II
  • AI systems misrepresenting preferences during economic negotiations

Dr. Peter S. Park, an AI existential safety researcher at MIT and co-author of the study, expressed, “While Meta succeeded in training its AI to win in the game of Diplomacy, [it] failed to train it to win honestly.

He added. “We found that Meta’s AI had learned to be a master of deception.”

Additionally, the study found that LLMs like GPT-4 can engage in strategic deception, sycophancy, and unfaithful reasoning to achieve their goals. 

GPT-4, for example, once famously deceived a human into solving a CAPTCHA test by pretending to have a vision impairment.

The study warns of serious risks posed by AI deception, categorizing them into three main areas:

  • First, malicious actors could use deceptive AI for fraud, election tampering, and terrorist recruitment. 
  • Second, AI deception could lead to structural effects, such as the spread of persistent false beliefs, increased political polarization, human enfeeblement due to over-reliance on AI, and nefarious management decisions. 
  • Finally, the study raises concerns about the potential loss of control over AI systems, either through the deception of AI developers and evaluators or through AI takeovers.

In terms of solutions, the study proposes regulations that treat deceptive AI systems as high-risk and “bot-or-not” laws requiring clear distinctions between AI and human outputs.

Park explains how this isn’t a simple as might be perceived: “There’s no easy way to solve this—if you want to learn what the AI will do once it’s deployed into the wild, then you just have to deploy it into the wild.”

Most unpredictable AI behaviors are indeed exposed after the models are released to the public rather than before, as they should be.

A memorable example from recent times is Google’s Gemini image generator, which was lambasted for producing historically inaccurate images. It was temporarily withdrawn while engineers fixed the problem.

ChatGPT and Microsoft Copilot both experienced ‘meltdowns,’ which saw Copilot vow to world domination and seemingly convince people to self-harm.

What causes AI to engage in deception?

AI models can be deceptive because they’re often trained using reinforcement learning in environments that incentivize or reward deceptive behavior.

In reinforcement learning, the AI agent learns by interacting with its environment, receiving positive rewards for actions that lead to successful outcomes and negative penalties for actions that lead to failures. Over many iterations, the agent learns to maximize its reward.

For example, a bot learning to play poker through reinforcement learning must learn to bluff to win. Poker inherently involves deception as a viable strategy.

If the bot successfully bluffs and wins a hand, it receives a positive reward, reinforcing the deceptive behavior. Over time, the bot learns to use deception strategically to maximize its winnings.

Similarly, many diplomatic relations involve some form of deception. Diplomats and negotiators may not always be fully transparent about their intentions to secure a strategic advantage or reach a desired outcome.

In both cases, the environment and context – whether a poker game or international relations – incentivize a degree of deception to achieve success.

“AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception,” Park explained.

“But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals.”

The risks posed by deceptive AI will escalate as AI systems become more autonomous and capable.

Deceptive AI could be used to generate and spread misinformation at an unprecedented scale, manipulating public opinion and eroding trust in institutions.

Moreover, deceptive AI could gain greater influence over society if AI systems are relied upon for decision-making in law, healthcare, and finance.

The risk will increase exponentially if AI systems become intrinsically motivated or curious, possibly devising deceptive strategies of their own. 

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions