The FT is granting OpenAI access to its news archives as generative AI companies continue to secure private data sources.
The arrangement involves ChatGPT providing summaries, direct quotes, and hyperlinks to full articles published by the FT, linking directly back to the original content on the website.
As part of the deal, OpenAI has committed to collaborating with the FT to develop new AI-driven products. The FT is a ChatGPT Enterprise customer and has experimented with AI before, incorporating Anthropic‘s Claude into a generative search tool called “Ask FT.”
John Ridding, FT CEO, stated of the deal, “Apart from the benefits to the FT, there are broader implications for the industry. It’s right, of course, that AI platforms pay publishers for the use of their material. OpenAI understands the importance of transparency, attribution, and compensation – all essential for us.”
A fair sentiment – though some would disagree that OpenAI understands the importance of transparency and attribution.
Brad Lightcap, COO of OpenAI, also chimed in: “Our partnership and ongoing dialogue with the Financial Times is about finding creative and productive ways for AI to empower news organizations and journalists, and enrich the ChatGPT experience with real-time, world-class journalism for millions of people around the world.”
While access to the FT’s data is valuable for OpenAI, its current datasets consist of trillions of words of dubiously ‘public’ or ‘open source’ data.
Deals with the FT and other media companies like Axel Springer come as AI companies acknowledge they need to start paying for data to address growing legal pressures. They’ve also become acutely aware that their models will quickly become outdated without fresh, high-quality data.
AI companies battle over data
The ethical stakes in AI data usage are immense. In their endless quest for data, tech giants like OpenAI, Google, and Meta have been reported to engage in practices that push or outright cross legal and ethical boundaries.
For instance, a New York Times investigation revealed that OpenAI developed a tool named Whisper to transcribe YouTube videos – despite potential violations of YouTube’s policies against using its videos for independent applications.
Similarly, Google and Meta have explored or implemented strategies that skirt or reinterpret existing copyright and privacy laws to gather more data.
Among the shadier strategies are altering privacy policies to allow AI applications to use publicly available content from platforms like Google Docs.
While AI companies are willing to pay for data now, that doesn’t spare them bending the rules elsewhere.