In its submission to the ongoing review of the Australian AI regulatory framework, Google has asked for looser copyright laws for AI training data.
For the last few months, Australia has upped the pace at which it tries to regulate the use of AI within its borders. It started a review of the regulatory framework around AI in June and one of the discussion points is how online data is used to train AI models.
Google has long supported a “fair use” approach rather than stricter copyright laws that would block AI data scrapers completely. An example of fair use is how Google crawls the internet to provide valid search results.
Google’s web crawler copies some of a website’s content and then displays it in the search results. For the most part, no one sees that as a copyright breach. If you don’t want Google to crawl your site you can block its web crawler.
In its submission to the Australian authorities, Google suggests that companies like it be allowed to scrape published material to train its AI models unless publishers actively opt out. OpenAI will also be following this discussion with interest after recently releasing its scraper GPTbot.
In a blog post in July, Google suggested the publishers could perhaps use an approach similar to how websites use their robots.txt files. These files currently contain instructions that either allow or disallow Google from crawling the site for its search engine.
That’s not how copyright works
The solution Google is suggesting goes contrary to the principle of copyright though.
Dr Kayleen Manwaring, a senior lecturer at UNSW Law and Justice told The Guardian, “If you want to reproduce something that’s held by a copyright owner, you have to get their consent, not an opt out type of arrangement … what they’re suggesting is a wholesale revamp of the way that exceptions work.”
That makes sense. If you want to copy and use someone’s work you need to ask them for permission. Google is suggesting that if you haven’t expressly told them not to, then your data should be fair game.
With creatives across a variety of genres decrying the wholesale consumption of their work by generative AIs, Google may be swimming upstream on this issue.
Adding to its woes is the clamping down on how Google currently scrapes and uses content from news publishers.
Ultimately users want to have useful AI tools, and that requires that AI models be trained on a lot of human-generated content. Finding an equitable way to do that isn’t going to be easy.