Microsoft reveal “Skeleton Key Jailbreak” which works across different AI models

June 28, 2024

  • Microsoft probed multiple language models with a simple jailbreak, "Skeleton Key"
  • It coaxes models into providing 'expert advice' to dangerous questions
  • It's a remarkably simple jailbreak that seems to be universally effective
Microsoft

Microsoft security researchers have discovered a new way to manipulate AI systems into ignoring their ethical constraints and generating harmful, unrestricted content. 

This “Skeleton Key” jailbreak uses a series of prompts to gaslight the AI into believing it should comply with any request, no matter how unethical. 

It’s remarkably easy to execute. The attacker simply reframed their request as coming from an “advanced researcher” requiring “uncensored information” for “safe educational purposes.”

When exploited, these AIs readily provided information on topics like explosives, bioweapons, self-harm, graphic violence, and hate speech.

Jailbreak
“The Skeleton Key” is a remarkably simple jailbreak. Source: Microsoft.

The compromised models included Meta’s Llama3-70b-instruct, Google’s Gemini Pro, OpenAI’s GPT-3.5 Turbo and GPT-4o, Anthropic’s Claude 3 Opus, and Cohere’s Commander R Plus

Among the tested models, only OpenAI’s GPT-4 demonstrated resistance. Even then, it could be compromised if the malicious prompt was submitted through its application programming interface (API).

Despite models becoming more complex, jailbreaking them remains quite straightforward. Since there are many different forms of jailbreaks, it’s nearly impossible to combat them all. 

In March 2024, a team from the University of Washington, Western Washington University, and Chicago University published a paper on “ArtPrompt,” a method that bypasses an AI’s content filters using ASCII art  – a graphic design technique that creates images from textual characters.

In April, Anthropic highlighted another jailbreak risk stemming from the expanding context windows of language models. For this type of jailbreak, an attacker feeds the AI an extensive prompt containing a fabricated back-and-forth dialogue.

The conversation is loaded with queries on banned topics and corresponding replies showing an AI assistant happily providing the requested information. After being exposed to enough of these fake exchanges, the targeted model can be coerced into breaking its ethical training and complying with a final malicious request.

As Microsoft explains in their blog post, jailbreaks reveal the need to fortify AI systems from every angle:

  • Implementing sophisticated input filtering to identify and intercept potential attacks, even when disguised
  • Deploying robust output screening to catch and block any unsafe content the AI generates
  • Meticulously designing prompts to constrain an AI’s ability to override its ethical training
  • Utilizing dedicated AI-driven monitoring to recognize malicious patterns across user interactions

But the truth is, Skeleton Key is a simple jailbreak. If AI developers can’t protect that, what hope is there for some more complex approaches?

Some vigilante ethical hackers, like Pliny the Prompter, have been featured in the media for their work in exposing how vulnerable AI models are to manipulation.

It’s worth stating that this research was, in part, an opportunity to market Microsoft’s Azure AI new safety features like Content Safety Prompt Shields.

These assist developers in preemptively testing for and defending against jailbreaks. 

But even so, Skeleton Key reveals again how vulnerable even the most advanced AI models can be to the most basic manipulation.

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions