Elon Musk temporarily restricted the number of tweets users can view in a day. This is described as a “temporary emergency measure.”
In his own tweet, Musk shared that unverified accounts now have a limit of reading 1,000 posts per day. New unverified accounts have a limit of 500 posts, while those with “verified” status are currently restricted to viewing 10,000 posts per day.
Initially, Musk had imposed stricter limits but revised them within hours of the announcement.
To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:
– Verified accounts are limited to reading 6000 posts/day
– Unverified accounts to 600 posts/day
– New unverified accounts to 300/day— Elon Musk (@elonmusk) July 1, 2023
Musk stated that these temporary restrictions were in response to “extreme levels of data scraping and system manipulation”.
He noted on Friday, “We were getting data pillaged so much that it was degrading service for normal users,” after users saw screens asking them to log in to see Twitter content.
Musk initially set reading limits of 6,000 posts per day for verified accounts, 600 for unverified accounts, and 300 for new unverified accounts. In a subsequent update, Musk stated that “several hundred organizations, maybe more, were scraping Twitter data extremely aggressively.”
Data scraping is the extraction of information from the internet.
In order to build complex large language models (LLMs), AI companies require data from real human conversations and where better to look for that data than the internet? To collect such data, bots tirelessly crawl sites like Twitter and extract text data.
However, though available to the public, much of this data is not there for the taking. Platforms like Twitter and Reddit want to be paid for their data.
Moreover, data scraping bots place strain on servers. Musk, who is critical of AI, said, “It is rather galling to have to bring large numbers of servers online on an emergency basis just to facilitate some AI startup’s outrageous valuation.”
Similarly, in April, Steve Huffman, Reddit’s CEO, told the New York Times, “The Reddit corpus of data is really valuable, but we don’t need to give all of that value to some of the largest companies in the world for free.”
Twitter has already begun to charge users for access to its application programming interface (API), often used by third-party apps and researchers, including AI companies.
But whose data is it anyway?
There is a form of digital guerilla warfare taking place on the servers hosting sites like Reddit and Twitter.
Data scrapers are intensively mining the internet to fuel AI models, even when that data is not intended to be used in such a way.
Reddit, Twitter, etc., are perfectly within their rights to crack down on data crawling, but it’s no easy task.
Scraping is against these sites’ terms of service but probably not illegal – though that depends on what you’re using the data for.
In essence, data scraping is a form of digital trespass. You’re still on someone’s property even if you’re not doing anything illegal.
Twitter seems to be developing novel techniques to curb data scraping, which certainly makes sense given Musk’s general criticisms of the AI industry and some of its key players.