OpenAI has responded to two near-identical class-action lawsuits brought against it by a number of authors and argued that most of the allegations should be dismissed.
The authors claim that their copyrights were infringed when the company used their work as training data for ChatGPT. The allegations in the suit are:
- direct copyright infringement
- vicarious copyright infringement
- removal of copyright management information (DMCA)
- unfair competition
- unjust enrichment
- negligence
OpenAI says that it’s happy to have the first allegation argued in court, but that the remaining five are wrong in law and should be dismissed.
The plaintiffs are claiming that because their books were used to train ChatGPT, everything that ChatGPT outputs is a derivative of their copyrighted work.
OpenAI says it will neither confirm nor deny that the books in question formed part of the training dataset. But it says that even if the books were included in the dataset, it was a tiny part of a huge amount of data.
OpenAI argued that this fell under the fair use principle which does not infringe on copyright.
The authors may have a tough time convincing the court of their argument. They aren’t arguing that ChatGPT is writing books similar to theirs and in competition with them. They’re saying that every word ChatGPT puts out is a copyright violation.
In its motion to dismiss OpenAI said, “According to the Complaints, every single ChatGPT output — from a simple response to a question (e.g., “Yes”), to the name of the President of the United States, to a paragraph describing the plot, themes, and significance of Homer’s The Iliad — is necessarily an infringing “derivative work” of Plaintiffs’ books.”
That’s going to be hard to argue. Even the accurate summaries that ChatGPT produces of the books in question are described in the motion as resembling “book reports or reviews” rather than an attempt at profiting from copyrighted work.
The details of the remaining arguments can be read in OpenAI’s motion to dismiss.
Why is OpenAI not asking for the entire case to be dismissed?
OpenAI’s motion says that it is not asking for the direct copyright infringement count to be dismissed “which OpenAI will seek to resolve as a matter of law at a later stage of the case.”
OpenAI and Meta haven’t disclosed exactly what data they used to train their models but it almost certainly included a lot of copyrighted material, including the books from the authors filing the lawsuits.
If copyrighted data was included, then OpenAI was certainly aware of that. The company will argue that the intent behind accessing the material was not to produce derivative works and to profit from them in competition with the originals.
If the court agrees then the decision will set a precedent that will be relied upon in a number of other AI-related lawsuits that are in progress.
A decision in its favor will mean that it can’t be sued for using copyrighted data in the past and that it can keep doing so to train its new models. And if the court decides that AI training falls under the fair use principle then it’ll be open season for other AI companies too.
It’s a risky move, but at some point, this issue will have to be resolved. If the decision goes against OpenAI, then it will have to scrap ChatGPT and start training it from scratch. This time with a far smaller set of data.