Ed Newton-Rex, ex-VP of Audio at Stability AI, announces “Fairly Trained”

January 18, 2024

Fairly Trained AI

Ed Newton-Rex, ex-VP of Audio at Stability AI, announced the launch of ‘Fairly Trained,’ a non-profit organization dedicated to certifying generative AI companies for fairer training data practices. 

The initiative aims to distinguish between companies that train their AI on unfairly scraped data and those that adopt a more ethical approach by licensing or creating their own proprietary data.

Newton-Rex stated on X, “It’s hard to know which generative AI companies train on scraped data and which take a more ethical approach by licensing. So today we’re launching Fairly Trained, a non-profit that certifies generative AI companies for fairer training data practices.”

This comes amid escalating criticism of laissez-faire data scraping for the purpose of training AI models, a process that has left copyright holders seething. The debate kicked up another notch earlier in January when a list of 16,000 artists used to train and optimize Midjourney was leaked.

Following that, companies like Magic: The Gathering and Wacom, which rely on human creativity, were fiercely lambasted for using AI-generated images on social media. Meanwhile, reports of AI job replacements surfaced on social media, including at Duolingo.

Midjourney and Stability AI, Newton-Rex’s former company, are currently locked in a copyright lawsuit that is set to progress toward a ruling this year. It’s one of many complaints lodged against the likes of OpenAI, Anthropic, Meta, Midjourney, Stability, and others. 

Stability AI has been scrutinized for using millions of copyrighted images and audio files in their models, raising questions about the boundaries of ‘fair use’ – which Newton-Rex now intends to address with the Fairly Trained program. 

Newton-Rex resigned from Stability AI last year, stating on X, “I’ve resigned from my role leading the Audio team at Stability AI because I don’t agree with the company’s opinion that training generative AI models on copyrighted works is ‘fair use.’”

Despite his resignation from Stability AI, Newton-Rex expressed optimism about achieving a harmonious relationship between generative AI and the creative industries, which has now been underscored by Fairly Trained. 

The Fairly Trained program

‘Fairly Trained’ introduces its first certification, the ‘Licensed Model (L) Certification.’

The goal is to highlight AI models that use training data ethically, ensuring no copyrighted work is used without a license. This applies to AI models across multiple fields like image and music generation.

To meet the criteria, training data must be either:

  • Contractually agreed upon with rights-holders.
  • Under an appropriate open license.
  • In the global public domain.
  • Owned by the model developer.

Companies must thoroughly check the rights status of their training data, and detailed records of the training data used must be maintained. The application involves a detailed written submission and a review process, concluding with certification and annual reevaluation.

While Newton-Rex concedes this first certification doesn’t address all concerns around generative AI training, such as the opt-in vs. opt-out debate, it’s a step forward.

Thus far, the program has been well-received. Dr. Yair Adato of BRIA AI commended it, stating in a blog post, “We proudly support the Fairly Trained certification. This initiative counters the industry’s opacity in data procurement, ensuring companies meet rigorous ethical standards.”

Christopher Horton, SVP at Universal, said, “We welcome the launch of the Fairly Trained certification to help companies and creators identify responsible generative AI tools that were trained on lawfully and ethically obtained materials.”

Fairly Trained has already certified nine generative AI companies across image generation, music creation, and voice synthesis, including Beatoven.AI, Boomy, BRIA AI, Endel, LifeScore, Rightsify, Somms.ai, Soundful, and Tuney.

It will be interesting to see what companies sign up for the program and how transparent they make their data. Ideally, the public should be able to see the datasets for themselves (providing its public domain or not otherwise proprietary or protected). 

There is some complexity in the certification, as the data must be in the “public domain globally,” which could be tricky to navigate due to varying copyright laws across different jurisdictions. 

What is considered public domain in one country may not be in another. For instance, a literary work might enter the public domain in the United States 70 years after the author’s death, but the same work might still be under copyright in Europe.

In any case, Fairly Trained’s requirement for data to be “in the public domain globally” implies a high standard.

Could this be the year of increased accountability for AI companies and more transparent data practices? 

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions