OpenAI: creating AI tools without using copyrighted material is “impossible”

January 8, 2024

OpenAI Regulation

In a statement of written evidence to the UK House of Lords, OpenAI stated that creating AI tools without using copyrighted material is “impossible.”

This comes amid an intensifying debate surrounding copyright’s interaction with AI, with authors, writers, and media outlets like the New York Times lodging lawsuits against OpenAI, Microsoft, Stability AI, Anthropic, Google, and Midjourney, to name but a few. 

Large language models (LLMs) such as ChatGPT and image generators like Midjourney, which recently hit the headlines for creating a database of 16,000 artists for model training purposes, rely on extensive copyrighted data for their training. 

In fact, copyright data forms the mainstay of AI training material because it’s abundant, covers a broad spectrum of human creativity, and is easily retrieved from the internet. 

AI companies argue this data is ‘fair use’ for their model training purposes, but many others disagree.  

In response to the House of Lords communications and digital select committee, OpenAI recently emphasized their need for copyrighted material for training LLMs like GPT-4.

OpenAI stated, “Because copyright today covers virtually every sort of human expression – including blogposts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials.”

The company further argued that restricting training materials to public domain sources would result in poor AI systems. 

“Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens,” OpenAI added.

You can read the entire written evidence submission here, which also touches on the future trajectory of AI, catastrophic risks, to which OpenAI advertises their Frontier Model Forum and Preparedness team, and regulation. 

The public reacts

Reactions to these statements have not exactly been sympathetic.

Dr. Gary Marcus, for example, a prominent voice in the industry, said this essentially self-labels AI models as a monetization device for stolen copyright work.

Indeed, it seems like this is almost a Freudian slip on OpenAI’s part, admitting that their business model is unworkable without manipulating the law.

There’s a palpable sense of injustice with so few in the upper echelons of Silicon Valley benefitting from the work of so many.

OpenAI’s statement also asserts that they understand ‘the needs’ of today’s ‘citizens,’ exposing a widening disconnect between big tech’s view of generative AI as a humanitarian, even philanthropic project and people’s fears it’s stealing their data and displacing their skills.

Dr. Marcus commented, “[AI companies]…should go back to the drawing board—and figure out how to build software that doesn’t have a plagiarism problem—rather than fleecing artists, writers, and other content providers.” 

Lawsuits are racking up

This also comes amid several lawsuits against OpenAI, with notable authors like John Grisham, Jodi Picoult, and George RR Martin suing the company in September last year for alleged “systematic theft on a mass scale.” 

Two esteemed journalists, Nicholas Gage and Nicholas Basbanes, lodged yet another complaint against OpenAI and Microsoft last week, adding to the growing number of legal challenges faced by AI companies from both the writing and visual arts communities.

OpenAI also responded to the New York Times lawsuit, stating they feel it’s “without merit,” seen below.

These developments raise concerns about the potential legal liabilities AI companies might face this year and in the future. How will they adapt? Will the public’s growing resistance have any impact on the industry’s trajectory?

And how can you ethically train large-scale generative AI models? Are ethics even compatible with the technology’s current incarnation? 

AI companies’ defenses are holding up so far, but the wedge between AI developers’ ideas of ‘fair use’ and how others perceive it is widening. 

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.


Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions