In a bid to tackle deep fakes and misinformation, the 7 AI companies that agreed to this week’s US voluntary AI framework have pledged to watermark AI-generated content.
Participants of the voluntary framework, announced by the White House on the 21st of June, include Google, OpenAI, Microsoft, Meta, Amazon, Anthropic, and Inflection.
As part of that framework, these companies have committed to developing watermarks to help the public identify the origin of AI-generated content and reduce deception.
Much like a conventional watermark, an AI watermark is attached to an image, video, audio file, or text.
Watermarking AI-generated content could reduce scams, fake viral campaigns, and sextortion. “This watermark will let creativity with AI flourish but curtails the risks of fraud and deception,” said the White House.
In a blog post published shortly after the White House announcements, OpenAI detailed its agreement to “develop robust mechanisms, including provenance and/or watermarking systems for audio or visual content.” It will also develop “tools or APIs to establish if a piece of content was made with their system.”
Google plans to enhance the reliability of information by integrating metadata and “other innovative techniques” in addition to watermarking.
Various other safeguards were announced by the White House on Friday, including conducting internal and external testing on AI systems before release, increasing investment in cybersecurity, and promoting collaboration across the industry to reduce AI risks.
OpenAI said these commitments mark “an important step in advancing meaningful and effective AI governance, both in the US and around the world.”
The company also promised to “invest in research in areas that can help inform regulation, such as techniques for assessing potentially dangerous capabilities in AI models.”
Nick Clegg, Meta’s president of global affairs, echoed OpenAI’s sentiment, describing these commitments as an “important first step in ensuring responsible guardrails are established for AI.”
Will AI watermarks work?
Watermarking AI-generated content, while conceptually appealing, is far from foolproof.
Images, videos, and audio recordings can bear tiny, faint graphics or audio that signal their AI-generated origins.
Likewise, integrating metadata into AI-generated files can provide information about the source and creation process of the content. However, removing watermarks using other AI tools or stripping metadata will likely be straightforward.
If watermarks can be removed, then non-watermarked AI-generated content suddenly gains legitimacy. The absence of a watermark could be used to argue an image is genuine when it isn’t – a potentially dangerous trade-off.
When it comes to AI-generated text, there’s no straightforward solution. Unlike images or audio, text doesn’t easily lend itself to embedding watermarks.
The primary approach here is passing text through AI detectors, which analyze the perplexity of the text – a measure of how likely an AI model is to predict a given sequence of words – to estimate if it’s AI-generated.
AI detectors have their shortcomings. They often yield high rates of false positives, leading to non-AI-generated content being wrongly flagged.
This issue is magnified when analyzing text written by non-native English speakers, who might use less common phrasings or have atypical syntax, further increasing false positive rates. A recent study advised against the use of AI detectors in education and recruitment settings for these reasons.
Furthermore, as AI models evolve, the line between human-written and AI-written content becomes increasingly blurry. As AI improves at mimicking human writing styles, detectors based on perplexity will become less reliable.
While watermarking is a step towards improving transparency and accountability in AI-generated content, it’s not a straightforward solution and doesn’t serve as an ultimate deterrent or ‘silver bullet.’