The AI industry has a problematic history of labor exploitation, with workers speaking out in Venezuela, Kenya, and other parts of the world with large data labeling and AI training markets.
Large AI models like ChatGPT and the GPT family of models require vast quantities of data, including text data moderated and labeled by humans.
The objective is to label text to instruct models about different types of content, particularly harmful or illegal content. This helps engineer filters and guardrails.
In previous cases in Kenya, workers at data services firm Sama on a project for OpenAI were exposed to content that contained themes of disturbing sexual abuse.
Several workers claimed they’d suffered mental health problems as a result, eventually culminating in a petition and legal action.
Similar has been observed in the content moderation industry, where workers are responsible for analyzing potentially illicit content across social media platforms.
According to a new report by WIRED, this work extends to particularly young individuals, often from impoverished backgrounds, who are drawn to online platforms promising work flexibility and higher wages than local standards. This comes at the cost of exposure to harmful content.
Hassan, a pseudonym for an 18-year-old from Pakistan, is one such individual who became a low-paid worker in the global AI supply chain, labeling data to train algorithms for some of the world’s largest AI companies.
Gig workers on crowdsourcing platforms like Toloka and Amazon Mechanical Turk often undertake these tasks. Hassan started his online career on Toloka. He used details from a relative to bypass age restrictions, a common practice among minors seeking such work.
WIRED’s investigation revealed multiple instances of underage workers in Pakistan and Kenya joining platforms like Toloka and Appen under false pretenses.
The dark side of data labeling
While data labeling work may seem innocuous, it sometimes involves sifting through disturbing content.
Hassan shared screen recordings of tasks where he was exposed to explicit language and sexually suggestive images. He recalls dealing with deeply troubling content, including sexualized images of minors and descriptions of violent acts, which continue to affect his mental health.
The allure of earning more than the national minimum wage is a strong motivator for these youngsters.
For many, gig work starts as a means to an end, such as funding a trip or supporting their families. However, workers sometimes endure long hours for meager pay, facing the risk of account suspension or bans for minor deviations in their work.
For Hassan, this work remains his sole source of income despite enrolling in a bachelor’s program. He notes that the pay has significantly decreased as more workers have joined these platforms, leading him to label the situation as “digital slavery.”
The situation here precisely mirrors that of other reports from Venezuela and Kenya. In Venezuela, entire families, including children aged 13, were involved in data labeling tasks.
Combined with AI’s centralized use in predominantly more affluent societies, this has led to criticisms of the technology as “colonial” in its eerily similar mechanics to colonial-era labor systematics – a form of “digital servitude.”
This underscores the need for more stringent age verification processes on these platforms and raises questions about the ethical sourcing of labor in the tech industry.
There have been similar occurrences in other industries, such as under-18s bypassing age verification to pick up work for delivery platforms such as Deliveroo.
As AI advances, ensuring that its foundational labor practices adhere to ethical standards becomes increasingly crucial.