OpenAI says it has developed a watermarking method that accurately detects text written by ChatGPT but is still debating whether or not to release it.
Detecting AI-generated text has become increasingly difficult as LLMs become better at writing content. Educators face an uphill battle to determine whether their students completed assignments themselves or simply had ChatGPT write them.
In an updated blog post, OpenAI revealed that it has “developed a text watermarking method that we continue to consider as we research alternatives.”
The company says the method is highly accurate and resistant to localized tampering like paraphrasing. It admits that it’s not foolproof though. Running the text through a translation system or using another LLM to reword the text bypasses the watermark effectiveness.
If you use an AI model to insert a special character like an emoji between each word and then delete the character it also renders the watermark useless. However, these technical limitations aren’t the only reason it hasn’t released the feature.
ChatGPT has been an especially useful writing tool for non-native English speakers. OpenAI says its research shows that releasing the watermarking tool could disproportionately impact groups like these and stigmatize their use of AI as a useful writing tool.
Text metadata alternatives
OpenAI engineers are working on ways to use metadata as a text provenance method instead of watermarking. Images generated by DALL-E 3 already have C2PA metadata.
OpenAI says it’s too early to say how effective adding metadata to AI-generated text would be but it does have some potential advantages. For one, metadata is cryptographically signed so there’s no risk of false positives.
The problem with using metadata is that it’s easily removed. OpenAI hasn’t explained how metadata would be applied to text but removing C2PA metadata from AI-generated images is extremely simple.
Some social media platforms strip out metadata when images are uploaded and simply taking a screenshot of the image circumvents C2PA. Will similar workarounds be effective with AI-generated text with added metadata?
If ChatGPT generated text and added metadata to it, you could take a screenshot of the text, upload it to ChatGPT, and have it convert the image to text. Goodbye metadata.
Bad for business
The other reason OpenAI may be hesitant to release the tool is that it only detects text generated by ChatGPT. If users know their AI-generated content will be easily spotted they’ll quickly move from ChatGPT to another platform.
The Wall Street Journal reported that OpenAI’s tool has been ready for release for a year and was 99% effective. The report said, “In trying to decide what to do, OpenAI employees have wavered between the startup’s stated commitment to transparency and their desire to attract and retain users.”
A global survey commissioned by OpenAI showed that the idea of an AI detection tool was supported by a margin of 4 to 1. However, an internal survey found that nearly a third of ChatGPT users would be put off by an AI-text detector.
Users want AI-generated content to be easy to spot, as long as it’s not content they’ve generated.