Journalists Nicholas Gage and Nicholas Basbanes have launched a copyright lawsuit against OpenAI and Microsoft, alleging that their works were utilized without consent to train ChatGPT.
Nicholas Gage, known for his WWII memoir “Eleni,” has worked for The New York Times and The Wall Street Journal. Nicholas Basbanes, a former journalist, has authored several books focusing on the history of publishing and books.
The journalists are represented by Grant Herrmann Schwartz & Klinger LLP and follow closely on the heels of a similar, groundbreaking lawsuit by The New York Times against these leading AI companies.
In their complaint, filed in Manhattan federal court, Gage, an investigative journalist, and Basbanes, an author, assert that OpenAI has acknowledged using e-book datasets, including “Books2,” which are sourced from pirated databases.
They stated, “OpenAI has admitted to using e-book datasets including ‘Books2’ that likely comes from pirated repositories online.”
This lawsuit aligns with others from renowned authors like Sarah Silverman, George R.R. Martin, and the Authors Guild.
Over in the art community, there’s a similar deluge of legal cases falling on AI companies. One of the most notable is submitted against Stability AI, Midjourney, and Deviantart, again claiming copyright infringement of artists’ work.
While a court rejected the complaint, it was resubmitted with more evidence, including a list of 16,000 artist names found in a leaked spreadsheet attributed to Midjourney developers.
In this most recent lawsuit, the plaintiffs allege that OpenAI’s latest model, ChatGPT-4, could, under specific prompting, reproduce near-verbatim text of entire copyrighted articles – a claim that had not yet been incorporated into legal proceedings until now.
According to Bloomberg Law, the lawsuit says, “Until recently, ChatGPT provided verbatim quotes of copyrighted text.”
“Currently, it instead readily offers to produce summaries of such text,” “These summaries themselves are derivative works, the creation of which is inherently based on the original unlawfully copied work.”
AI companies have previously argued that copying books and articles for training large language models falls under copyright law’s fair use doctrine, which is central to their defense in this evolving debate.
There is a palpable sense that this defense is on thin ice, however.