OpenAI has entered into a multi-year, $250 million agreement with Rupert Murdoch’s News Corp, which includes The Wall Street Journal, New York Post, The Times, and The Sunday Times.
Similar to a deal struck with the Financial Times not many days ago, OpenAI will gain access to current and archived content to train and refine its AI models while featuring links back to source articles.
These deals are typically billed as a “collaboration,” but primarily, they’re quite simple transactions. OpenAI gains high-quality written work, and News Corp is paid handsomely for it.
Only last year, News Corp and several other leading publishers joined a coalition to protect their services from unauthorized AI scraping. The tables have turned now big money is on the table.
News Corp CEO Robert Thomson heralded the alliance as a historic moment, stating in a press release, “We believe this agreement will set new standards for veracity, virtue, and value in the digital age. We are delighted to have found principled partners in Sam Altman and his talented team, who understand the commercial and social significance of journalists and journalism.”
OpenAI’s CEO, Sam Altman, echoed Thomson’s sentiments, stating, “Our partnership with News Corp is a proud moment for journalism and technology,” Altman declared. “We greatly value News Corp’s history as a leader in reporting breaking news around the world and are excited to enhance our users’ access to its high-quality reporting.”
The News Corp deal is the latest in a series of agreements OpenAI has struck with major publishers, including Axel Springer (Politico, Business Insider), The Associated Press, Financial Times, and Dotdash Meredith (Investopedia).
These partnerships come as Microsoft, OpenAI, and Google face increasing scrutiny over their use of copyrighted content to train AI models without proper compensation or consent.
Data has long been termed the ‘new oil,’ but that nomenclature has started to look even more appropriate as AI companies queue up to buy data rather than scrape it freely from the internet as they wade through a sea of lawsuits.
Moreover, AI companies risk their models becoming steadily outdated, which could freeze them in time and impact their ability to update a model’s ‘knowledge cut-off.’
News data is among the best data for training and optimizing models while keeping them current with current events.
The AI training data divide
While some media organizations have chosen to collaborate with AI companies, others have taken legal action.
The New York Times and a group of eight major newspapers owned by Alden Global Capital have filed lawsuits against OpenAI and Microsoft, seeking billions in damages for allegedly using unlicensed content to train their AI systems.
Meanwhile, journalists and unions have expressed fears that management could leverage AI to generate content for their brands, potentially leading to job cuts and a decline in the quality and accuracy of reporting.
Experiments with AI-generated content by publications like Sports Illustrated and CNET have already faced massive backlash due to numerous errors and inconsistencies.
The Independent Association of Publishers’ Employees (IAPE) union, representing Dow Jones workers in the US and Canada, voiced disappointment that no agreement on AI protection for bylined work had been reached before the announcement of the News Corp-OpenAI partnership.
That’s a key point. News companies are powered by human writers who see their work retrospectively sold from under their feet.
Legal? Sure. Ethical? Maybe not so much – at least not without writers being allowed to opt out.
In the end, AI companies will continue buying up data, which is intrinsic to creating the best models. News companies may come to regret it if their sites become redundant at the hands of generative AI.