Tech companies like Microsoft, NVIDIA, and Apple trade trust for data and talent

July 16, 2024

  • An investigation revealed how major tech companies used YouTube subtitles data
  • This violates the platform's terms and conditions, as well as creator rights
  • Meanwhile, the UK antitrust regulator is probing the Microsoft-Inflection merger
tech companies

In the mad dash to dominate the AI industry, tech giants are pushing ethical boundaries and testing the limits of public trust. 

A pattern of recent revelations raises alarm bells about data privacy, fair competition, and the concentration of power and talent. 

First off, an investigation by Proof News and WIRED uncovered that Apple, NVIDIA, Anthropic, and Salesforce have been using a dataset containing subtitles from over 170,000 YouTube videos to train their AI models. 

This dataset, known as “YouTube Subtitles,” was compiled without the consent of content creators, potentially violating YouTube’s terms of service.

The scale of this data mining operation is staggering. It includes content from educational institutions like Harvard, popular YouTubers such as MrBeast and PewDiePie, and even major news outlets like The Wall Street Journal and the BBC. 

YouTube is yet to react, but back in April, CEO Neal Mohan said OpenAI’s potential use of videos to train text-to-video model Sora would violate its terms of service, telling Bloomberg, “If Sora used content from YouTube it would be a ‘clear violation’ of its terms of service.”

OpenAI isn’t among the accused on this occasion, and we don’t know whether YouTube will attempt to take action if the new allegations are proven truthful. 

This is far from the first time tech companies have been caught in the crosshairs for data usage practices. 

In 2018, Facebook faced intense scrutiny over the Cambridge Analytica scandal, where millions of users’ data was harvested without consent for political advertising. 

More pertinently to AI, in 2023, it was discovered that a dataset called Books3, containing over 180,000 copyrighted books, had been used to train AI models without authors’ permission. This led to a wave of lawsuits against AI companies, with authors claiming copyright infringement. 

That’s just one example from an ever-growing stack of lawsuits emanating from every corner of the creative industries. Universal Music Group, Sony Music, and Warner Records are among the most prolific entities that added their names to the list after joining together to target text-to-audio AI companies Udio and Suno. 

In their rush to build more advanced AI models, it seems as if tech companies have adopted an “ask for forgiveness, not permission” approach to data acquisition.

The Microsoft-Inflection merger

While the YouTube scandal unfolds, Microsoft’s recent hiring spree from AI startup Inflection has caught the eye of UK regulators. 

The Competition and Markets Authority (CMA) has launched a phase one merger investigation, probing whether this mass hiring constitutes a de facto merger that could stifle competition in the AI sector.

This incisive move by Microsoft included scooping up Inflection’s co-founder Mustafa Suleyman (a former Google DeepMind executive) and a significant portion of the startup’s staff.

Inflection once marketed itself as a proud independent AI lab. They then proved that a dying breed. 

It takes on added weight when considering Microsoft’s existing partnerships in the AI field. The company has already invested a total of some $13 billion in OpenAI, raising questions about market concentration. 

Thickening the plot, Microsoft recently retreated from its non-voting seat at OpenAI. Experts say this likely resulted from a decision to rein in the company’s oversight to appease antitrust authorities. 

Alex Haffner, a competition partner at law firm Fladgate, said of Microsoft’s surprise decision, “It is hard not to conclude that Microsoft’s decision has been heavily influenced by the ongoing competition/antitrust scrutiny of its (and other major tech players) influence over emerging AI players such as OpenAI.”

A trust deficit?

Both the YouTube data mining scandal and Microsoft’s hiring practices contribute to a growing trust deficit between Big Tech and the public. 

An immediate impact is that content creators have become more guarded about their work in fear of exploitation. 

This could have a knock-on effect on content creation and sharing, ultimately impoverishing the very platforms that tech companies rely on for data.

Similarly, the concentration of AI talent in a few major companies is homogenizing AI development and limiting diversity.

For tech companies, rebuilding trust will likely require more than just compliance with future regulations and antitrust investigations. 

Questions linger: can we harness the true potential of AI while preserving ethics, fair competition, and public trust?

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Sam Jeans

Sam is a science and technology writer who has worked in various AI startups. When he’s not writing, he can be found reading medical journals or digging through boxes of vinyl records.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions