Google to use everything you post online to train its AI

July 6, 2023
Google's privacy terms update for AI data

Google says that all publicly available data on the internet is fair game to scrape and use to train its AI products.

Google’s updated privacy policy now states that “Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public.” It goes on to say that it uses publicly available information to “help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.” 

The previous version of the policy referred to Google using the data to train “language models” whereas it now refers to “AI models” which broadens its scope.

The loosely defined “information” and the expanded targets for training using this data seem to indicate that if you post something online, expect Google to scrape it and add it to its training data.

We understand that if we post a comment on Facebook, Tweet something, or write a review on Amazon, it’s out there for the public to read. You don’t expect it to be private. But are you comfortable with your words being used to train an AI model?

The change in policy wording may also be a signal of Google’s intent to ramp up its scraping efforts. And the pace at which Google and other AI companies are scraping publicly available data is having devastating impacts on numerous platforms.

Twitter recently limited access to its services as its servers struggled to keep up with “extreme levels of data scraping and system manipulation,” according to Elon Musk. Twitter also removed free access to its API in an effort to curb scrapers and consequently broke a lot of third-party services that rely on the API.

Reddit has also not gone unscathed in this rush for data. It too removed free access to the Reddit API, partly due to exploitation by scrapers. The resulting backlash from the Reddit moderators that make use of the API has effectively shut down parts of the internet.

Hundreds of the largest subreddits were made private or invisible by protesting subreddit moderators. The owners of Reddit are leveling not-so-subtle ultimatums to the moderators to open the subreddits back up again but the protest continues.

The irony is that Google is suffering as a consequence too. Appending “Reddit” to a Google search query has become a popular way to get very specific results for a query. The Reddit blackout has rendered a lot of those search results inaccessible now.

Most platforms have terms of service policies that prohibit data scraping, but breaking the terms of service doesn’t necessarily equate to breaking the law. While they try to work it out, make sure that you’re ok with Google and others using your data to train their AI models before posting anything online.

Join The Future


Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.


Stay Ahead with DailyAI


Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.


*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions