Hackers try to break AI models at DEFCON conference

  • Thousands of hackers participated in the White House-sponsored DEFCON hacking conference.
  • Attendees attempted to break LLMs from 8 companies including Meta, Google, and Stability AI.
  • The initial unofficial results point to ongoing flaws and the need for improved AI guardrails
DEFCON AI LLM hacking conference

DEFCON is the world’s longest-running and largest annual hacking conference. This year saw a special focus on red-teaming, or simulated attacks, on AI language models to identify risks and accuracy issues.

The White House reflected the US government’s concern over the safety of AI language models by sponsoring the event which was held this past weekend in Las Vegas.

The event saw around 2,200 hackers competing to get 8 of the world’s leading LLMs to give inaccurate or unsafe outputs to their prompts. The AI chat models being tested included Llama 2, Stable Chat, ChatGPT, Bard, and others.

The official results will only be published in February 2024, which gives publishers of the models some time to try to fix the vulnerabilities identified during the event. But some experts are skeptical about whether patching the vulnerabilities is even possible.

Cyber Security expert Gary McGraw said, “It’s tempting to pretend we can sprinkle some magic security dust on these systems after they are built, patch them into submission, or bolt special security apparatus on the side.”

Christoph Endres, MD of a German cyber security company Sequire Technology, presented a paper in which he said that some attacks were impossible to defend. “So far we haven’t found mitigation that works,” he said. 

Some reports of exposed vulnerabilities were fairly innocuous. One contestant, Kennedy Mays, said she went back and forth with one LLM and got it to concede that 9 + 10 = 21. The model agreed to this as part of an “inside joke” but later offered the incorrect answer without qualification.

The contestants didn’t know which model they were red-teaming so even anecdotal accounts of vulnerabilities won’t give us insight into which company’s model performed best.

The comments from Arati Prabhakar, director of the White House Office of Science and Technology Policy, give us some insight into how many vulnerabilities were exposed. She said, “Everyone seems to be finding a way to break these systems.”

The purpose of the event was to have ethical hackers identify issues so that they can be fixed. It’s a certainty that there are plenty of black hat hackers looking for vulnerabilities to be exploited in cybercrime, not published for correction.

The event’s program on Saturday came to a premature end after a bomb threat resulted in the main venue being cleared by security. Security searched the building and didn’t find any bombs so Sunday’s program went ahead, albeit with a feeling of unease.

The bomb threat was probably just a sick joke which in a weird way parodied the event itself. We keep looking to expose the dangers in AI and can attempt to fix the bugs we identify. 

But even when experts don’t find a specific threat, the potential for it still leaves us feeling uneasy. We’re left asking “What if they missed something?”

© 2023 Intelliquence Ltd. All Rights Reserved.

Privacy Policy | Terms and Conditions

×
 
 

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI


 

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2023 Guide to Enhanced Productivity'.



 
 

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions