Hackers try to break AI models at DEFCON conference

DEFCON is the world’s longest-running and largest annual hacking conference. This year saw a special focus on red-teaming, or simulated attacks, on AI language models to identify risks and accuracy issues.

The White House reflected the US government’s concern over the safety of AI language models by sponsoring the event which was held this past weekend in Las Vegas.

The event saw around 2,200 hackers competing to get 8 of the world’s leading LLMs to give inaccurate or unsafe outputs to their prompts. The AI chat models being tested included Llama 2, Stable Chat, ChatGPT, Bard, and others.

The official results will only be published in February 2024, which gives publishers of the models some time to try to fix the vulnerabilities identified during the event. But some experts are skeptical about whether patching the vulnerabilities is even possible.

Cyber Security expert Gary McGraw said, “It’s tempting to pretend we can sprinkle some magic security dust on these systems after they are built, patch them into submission, or bolt special security apparatus on the side.”

Christoph Endres, MD of a German cyber security company Sequire Technology, presented a paper in which he said that some attacks were impossible to defend. “So far we haven’t found mitigation that works,” he said.

Some reports of exposed vulnerabilities were fairly innocuous. One contestant, Kennedy Mays, said she went back and forth with one LLM and got it to concede that 9 + 10 = 21. The model agreed to this as part of an “inside joke” but later offered the incorrect answer without qualification.

Findings won’t be made public until about February. And even then, fixing flaws in these digital constructs — whose inner workings are neither wholly trustworthy nor fully fathomed even by their creators — will take time and millions of dollars. #defcon https://t.co/JMKxykWmcE

— Chris Wysopal (@WeldPond) August 14, 2023

The contestants didn’t know which model they were red-teaming so even anecdotal accounts of vulnerabilities won’t give us insight into which company’s model performed best.

The comments from Arati Prabhakar, director of the White House Office of Science and Technology Policy, give us some insight into how many vulnerabilities were exposed. She said, “Everyone seems to be finding a way to break these systems.”

The purpose of the event was to have ethical hackers identify issues so that they can be fixed. It’s a certainty that there are plenty of black hat hackers looking for vulnerabilities to be exploited in cybercrime, not published for correction.

The event’s program on Saturday came to a premature end after a bomb threat resulted in the main venue being cleared by security. Security searched the building and didn’t find any bombs so Sunday’s program went ahead, albeit with a feeling of unease.

The bomb threat was probably just a sick joke which in a weird way parodied the event itself. We keep looking to expose the dangers in AI and can attempt to fix the bugs we identify.

But even when experts don’t find a specific threat, the potential for it still leaves us feeling uneasy. We’re left asking “What if they missed something?”

Hackers try to break AI models at DEFCON conference

Join The Future

Eugene van der Watt

RELATED POSTS

OpenAI announces “SearchGPT” to try and stay at the front of the pack

Meta releases Llama 3.1 models, sticks with open strategy

Senate probes OpenAI’s safety and governance after whistleblower claims

Google’s AI predicts weather using fraction of computing power

Hackers try to break AI models at DEFCON conference

Join The Future

Eugene van der Watt

RELATED POSTS

OpenAI announces “SearchGPT” to try and stay at the front of the pack

Meta releases Llama 3.1 models, sticks with open strategy

Senate probes OpenAI’s safety and governance after whistleblower claims

Google’s AI predicts weather using fraction of computing power

FREE PDF EXCLUSIVEStay Ahead with DailyAI

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI