A new study reveals the complexities of the GDPR’s “right to be forgotten” (RTBF) in the context of AI.
Also known as the Right to Erasure, this gives individuals the power to demand tech firms permanently delete their personal data. Yet, in the context of large language models (LLMs) and AI chatbots, there’s no simple way to reverse engineer model training to remove specific data.
The right to be forgotten extends beyond Europe’s GDPR. Comparable legislation is found in Canada (CCPA), Japan (APPI), and several other countries. Originally, RTBF procedures were primarily designed for search engines, making it relatively simple for companies like Google and Microsoft to locate and eliminate specific data from their web indexes.
Researchers from the Data61 Business Unit, a branch of Australia’s National Science Agency specializing in AI, robotics, and cybersecurity, explored RTBF for AI in a recent study.
They aimed to investigate if and how RTBF could function in a new era of AI models trained on vast quantities of data extracted from the internet. This data contains names and likely other personally identifiable information (PII).
In some cases, AIs may output incorrect or even libelous information about people. In recent months, OpenAI has been embroiled in multiple libel cases, with its output alleging one man committed fraud and another sexual assault, which is true in neither case.
For any of these situations, deleting the data inflicting the allegations should be an absolute minimum.
However, as pointed out by the researchers, machine learning (ML) algorithms are not as straightforward as search engines.
They highlight that LLMs store and process information “in a completely different way” compared to the indexing approach used by search engines.
And how do you even know if your personal data is contained in the model? According to the researchers, users can only acquire knowledge about their personal data within LLMs “by either inspecting the original training dataset or perhaps by prompting the model.” The latter is how Mark Walters, from Georgia, USA, discovered his name is linked to fraud in some of ChatGPT’s outputs.
ChatGPT said of Walters, “Mark Walters (‘Walters’) is an individual who resides in Georgia…Walters has breached these duties and responsibilities by, among other things, embezzling and misappropriating SAF’s funds and assets for his own benefit, and manipulating SAF’s financial records and bank statements to conceal his activities.”
While AI services pose challenges to the right to be forgotten, that doesn’t mean they’re absolved from respecting privacy rights.
The researchers propose various strategies for eliminating data from AI training models, including the “machine unlearning” SISA technique, Inductive Graph Unlearning, and Approximate Data Deletion, among others.
These methods could enable AI developers to reliably probe their datasets and remove specific data to uphold the RTBF.
Can you remove your data from AI models like ChatGPT?
OpenAI has introduced procedures for individuals to request the deletion of personal data in AI models and opt out of future data use for training AI.
Additionally, users can make a Data Subject Access Request (DSAR) to exercise GDPR-granted rights such as data correction, restriction, or transfer.
However, OpenAI noted that correcting inaccurate data generated by its models is currently unfeasible, so deletion would likely be the solution.
Despite these mechanisms, OpenAI warned that it might deny or only partially act on requests based on legal constraints and balancing privacy requests against freedom of expression.
OpenAI also offers an opt-out for users who don’t want their data used for AI training via ChatGPT account settings.
OpenAI provides the following email address for correspondence on the matter: [email protected].
Of course, ChatGPT is not the only AI trained on open internet data. Anyone wishing to remove their personal information from all major public AI chatbots must contact each developer separately.
The reality is most data published on the internet is up for grabs for AI companies, and removing data from models is exceptionally challenging.