Authors Mona Awad and Paul Tremblay have filed a lawsuit against OpenAI, the creator of the ChatGPT.
Mona Awad, author of 13 Ways of Looking at a Fat Girl, and Paul Tremblay, author of The Cabin at the End of the World, allege that the AI platform infringed upon their copyrights by training its model on their work without consent.
ChatGPT is trained on publicly available data from the internet. Yet, according to Awad and Tremblay, their copyrighted novels were illicitly used to train ChatGPT, a suspicion raised due to the chatbot’s ability to generate “very accurate summaries” of their books, as stated in the lawsuit documents.
The authors argue that when they asked ChatGPT to provide summaries of their novels, the AI returned information that isn’t obviously publicly available. While ChatGPT could learn from summaries published on sites like Wikipedia, these don’t provide sufficient detail to justify the depth of the AI’s summary. The plaintiffs provide exhibits to demonstrate this.
This is the first of its kind against OpenAI concerning copyright issues, says Andres Guadamuz, a reader in intellectual property law at the University of Sussex.
With that said, it’s the next entry in a long line of legal cases lodged against OpenAI, albeit most are libel cases.
Moreover, there have been several copyright cases in the realm of image generation, including Getty Images, which lodged a lawsuit against Stability AI for using its images without compensation.
AI use of “shadow libraries”
Saveri and Butterick claim that OpenAI has grown “increasingly secretive” regarding its training data, which is probably a fair comment. In the early development stages of ChatGPT, OpenAI revealed that it used a large “internet-based books corpora” for training, dubbed “Books2.”
Given the estimated 294,000 titles in the dataset, the lawyers suspect OpenAI sourced the books from “shadow libraries” such as Library Genesis (LibGen) and Z-Library. So-called shadow libraries are digital repositories for books and academic papers.
While primarily intended for academic use, most contain illegally copied books.
There’s very little consensus on the various legalities of shadow libraries – they’ve become part of the internet’s furniture.
The complaint says, “These flagrantly illegal shadow libraries have long been of interest to the AI-training community.”
In a report by the Guardian, Lilian Edwards, a professor of law, innovation and society at Newcastle University, conjectures that this particular case might hinge on whether courts consider the use of copyrighted material for AI training as ‘fair use.’ Definitions of fair use vary internationally, which further complicates the problem.
The odds of success? Likely small, but legal pressures are mounting on AI companies of all stripes.
Forthcoming AI regulations, like the EU AI Act, are set to enforce transparency guidelines that obligate developers to publish their training data.