OpenAI has confirmed a licensing agreement with The Associated Press (AP) to train its AI models using its news story archive.
The essence of the agreement says that AP will provide OpenAI with access to its trove of text stories for AI training. In return, OpenAI will extend its technology to AP, enabling them to integrate generative AI into their workflows.
OpenAI will have the right to scrape data from AP’s story archive stretching back to 1985.
“Generative AI is a fast-moving space with tremendous implications for the news industry. We are pleased that OpenAI recognizes that fact-based, nonpartisan news content is essential to this evolving technology, and that they respect the value of our intellectual property,” said Kristin Heitmann, AP senior vice president and chief revenue officer.
The practice of using public internet data to train AI systems is becoming a point of contention. This will likely increase the popularity of these types of private and sponsored deals.
The large language models (LLMs) powering chatbots at OpenAI, Google, etc, have been trained on a colossal amount of data gathered from publicly accessible internet sources.
This includes third-party content like news articles, Wikipedia entries, and comments from social media and blogs, all taken without explicit permission or awareness of the authors.
This isn’t without its legal and ethical challenges, as it’s improbable all of this data is collected legally. At least, AI training data warps the meaning of ‘open’ and ‘publicly accessible.’
Andres Sawicki, a professor of intellectual property law at the University of Miami, commented, “The data sets include a lot of content that is copyrighted. The copyright holders do not approve of these exploitations. It’s not hard to conceive of more deals like the AP one being made between tech firms and content producers in an effort to build a “clean database.” The problem is that the data sets needed to train the models are so massive that I doubt it will be possible to secure permission from a sufficient number of owners to make the technology practical.”
This week, the US Federal Trade Commission (FTC) launched investigations into OpenAI’s practices surrounding the use of data in model training. The FTC has demanded documentation from OpenAI to understand its strategies and identify non-compliance.
OpenAI and AP have expressed positive sentiments about the partnership, stating they “believe in the responsible creation and use of these AI systems.”