Google says that all publicly available data on the internet is fair game to scrape and use to train its AI products.
The previous version of the policy referred to Google using the data to train “language models” whereas it now refers to “AI models” which broadens its scope.
The loosely defined “information” and the expanded targets for training using this data seem to indicate that if you post something online, expect Google to scrape it and add it to its training data.
We understand that if we post a comment on Facebook, Tweet something, or write a review on Amazon, it’s out there for the public to read. You don’t expect it to be private. But are you comfortable with your words being used to train an AI model?
The change in policy wording may also be a signal of Google’s intent to ramp up its scraping efforts. And the pace at which Google and other AI companies are scraping publicly available data is having devastating impacts on numerous platforms.
Twitter recently limited access to its services as its servers struggled to keep up with “extreme levels of data scraping and system manipulation,” according to Elon Musk. Twitter also removed free access to its API in an effort to curb scrapers and consequently broke a lot of third-party services that rely on the API.
Reddit has also not gone unscathed in this rush for data. It too removed free access to the Reddit API, partly due to exploitation by scrapers. The resulting backlash from the Reddit moderators that make use of the API has effectively shut down parts of the internet.
Hundreds of the largest subreddits were made private or invisible by protesting subreddit moderators. The owners of Reddit are leveling not-so-subtle ultimatums to the moderators to open the subreddits back up again but the protest continues.
The irony is that Google is suffering as a consequence too. Appending “Reddit” to a Google search query has become a popular way to get very specific results for a query. The Reddit blackout has rendered a lot of those search results inaccessible now.
Most platforms have terms of service policies that prohibit data scraping, but breaking the terms of service doesn’t necessarily equate to breaking the law. While they try to work it out, make sure that you’re ok with Google and others using your data to train their AI models before posting anything online.