The New York Times is considering taking legal action against OpenAI over suspicions that it scraped its news content to train ChatGPT.
OpenAI admits that it mined data that was freely available on the internet but has not disclosed exactly what data it used. The company didn’t obtain permission to use the data, but the laws around data scraping for AI training aren’t clear.
Besides the issue of using its data without permission, The NYT is concerned that responses from ChatGPT could be in competition with its own content, in addition to being a copyright infringement.
If a user asked ChatGPT about a newsworthy event, then the response could potentially contain a paraphrased response based on reporting originally published by The NYT.
Before ChatGPT (BC?), a user would type the question into Google and then click through to The NYT website, where the media company could monetize its published content. Now a user could potentially get the answer without any revenue accruing to the company that created the material.
What could happen if The NYT sues OpenAI?
There are a lot of ‘ifs’, but if The NYT joins the growing list of others suing OpenAI and wins its suit then there could be serious consequences.
Federal copyright law states that if OpenAI deliberately infringed the copyright, then it could face fines of up to $150,000 for each infringement. The NYT has a huge repository of archived material so depending on how the judge views “individual infringements” this could be a company-ending fine.
Federal copyright law also allows for the material that violated the copyright to be destroyed. That’s a huge problem for OpenAI. The offending material is the training data set for ChatGPT.
If that is destroyed then it means that OpenAI has to rebuild ChatGPT from scratch. And this time without the benefit of the free-for-all grab of internet data.
What are other media companies doing?
Barry Diller, Chairman of IAC, is leading a coalition of key publishers to collectively bring lawsuits against AI companies to get compensation for what they say are violations of their copyright.
The NYT entered discussions with the group but subsequently declined to join its coalition.
In July, The Associated Press made a deal with OpenAI to license its content dating back to 1985. The agreement acknowledged the mutual benefit as AP looks to use OpenAI’s generative AI in its operations.
In a joint statement, the companies said, “The arrangement sees OpenAI licensing part of AP’s text archive, while AP will leverage OpenAI’s technology and product expertise.”
This agreement is likely a good example of the way many of these issues between publishers and AI companies will be resolved. This scenario probably represents the best-case scenario for OpenAI.