Hackers try to break AI models at DEFCON conference

August 15, 2023

DEFCON AI LLM hacking conference

DEFCON is the world’s longest-running and largest annual hacking conference. This year saw a special focus on red-teaming, or simulated attacks, on AI language models to identify risks and accuracy issues.

The White House reflected the US government’s concern over the safety of AI language models by sponsoring the event which was held this past weekend in Las Vegas.

The event saw around 2,200 hackers competing to get 8 of the world’s leading LLMs to give inaccurate or unsafe outputs to their prompts. The AI chat models being tested included Llama 2, Stable Chat, ChatGPT, Bard, and others.

The official results will only be published in February 2024, which gives publishers of the models some time to try to fix the vulnerabilities identified during the event. But some experts are skeptical about whether patching the vulnerabilities is even possible.

Cyber Security expert Gary McGraw said, “It’s tempting to pretend we can sprinkle some magic security dust on these systems after they are built, patch them into submission, or bolt special security apparatus on the side.”

Christoph Endres, MD of a German cyber security company Sequire Technology, presented a paper in which he said that some attacks were impossible to defend. “So far we haven’t found mitigation that works,” he said. 

Some reports of exposed vulnerabilities were fairly innocuous. One contestant, Kennedy Mays, said she went back and forth with one LLM and got it to concede that 9 + 10 = 21. The model agreed to this as part of an “inside joke” but later offered the incorrect answer without qualification.

The contestants didn’t know which model they were red-teaming so even anecdotal accounts of vulnerabilities won’t give us insight into which company’s model performed best.

The comments from Arati Prabhakar, director of the White House Office of Science and Technology Policy, give us some insight into how many vulnerabilities were exposed. She said, “Everyone seems to be finding a way to break these systems.”

The purpose of the event was to have ethical hackers identify issues so that they can be fixed. It’s a certainty that there are plenty of black hat hackers looking for vulnerabilities to be exploited in cybercrime, not published for correction.

The event’s program on Saturday came to a premature end after a bomb threat resulted in the main venue being cleared by security. Security searched the building and didn’t find any bombs so Sunday’s program went ahead, albeit with a feeling of unease.

The bomb threat was probably just a sick joke which in a weird way parodied the event itself. We keep looking to expose the dangers in AI and can attempt to fix the bugs we identify. 

But even when experts don’t find a specific threat, the potential for it still leaves us feeling uneasy. We’re left asking “What if they missed something?”

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions