Most people would testify that the internet is an at-times hostile environment, but what if you were exposed to the very worst it had to offer every day?
Content moderators are tasked with sifting through text, images, and video and manually flagging harmful content, from racial slurs and hate speech to discussions or depictions of murder and suicide.
The damaging psychological impacts of the job are well-documented, both anecdotally in the form of open letters from those inside the industry and in academic studies.
The burgeoning generative AI industry has fueled fresh demand for content moderators, and once again, stories from inside that challenging job are beginning to surface.
Data workers in Kenya speak out about AI content moderation
In Kenya, several former content moderators for OpenAI’s ChatGPT have lodged a petition to the Kenyan government, demanding an investigation into what they describe as “exploitative conditions.”
The exploitative business activities in question revolve around OpenAI’s contracted services of Sama, a data annotation services company based in California.
As per the petition, “Throughout the contract of training ChatGPT, we were not afforded psychosocial support. Due to the exposure to this kind of work, training ChatGPT, we have developed severe mental illnesses, including PTSD, paranoia, depression, anxiety, insomnia, sexual dysfunction, to mention a few.”
TIME, who also investigated Sama’s relationship with Meta in Kenya on a similar project, reviewed documents suggesting OpenAI signed 3 contracts with Sama to the value of around $200,000. The contracts involved labeling textual descriptions of sexual abuse, hate speech, and violence.
Around 36 workers in 3 teams worked on the project, one focusing on each subject. All workers interviewed by TIME said the task impacted their mental health.
Mophat Okinyi, a former moderator, revealed the psychological toll the work has had on him. “It has really damaged my mental health,” said Okinyi. He recalled viewing up to 700 text passages a day, many containing graphic sexual violence, which led to paranoia and isolation. He eventually lost his wife, who said he was a “changed man.”
TIME reported that one worker had to read a graphic description of bestiality in the presence of a child, describing it as “Torture.” He went on to say, “You will read a number of statements like that all through the week. By the time it gets to Friday, you are disturbed from thinking through that picture.”
The Kenya petition draws attention to the horrific content the contractors had to review, which often involved scenes of violence, self-harm, murder, rape, necrophilia, child abuse, bestiality, and incest. As per a Sama spokesperson, the workers earned between $1.46 and $3.74 an hour for the job.
Low wages for AI-related data services are well-documented on social media, with one Redditor speaking of their experience of training Bard, “20$/hr is not enough for the horrible treatment we get, so I’m gonna squeeze every cent out of this ******* job.”
$20/hr is a far cry from the sub-$5/hr paid in Kenya. Should AI companies be so quick to race to the bottom when the work itself is business-critical and the content hazardous?
Foxglove, a non-profit legal NGO supporting Kenyan workers’ cases against Meta and OpenAI, describes this as blatant low-wage labor exploitation.
Now four former data labellers are asking the Kenyan parliament to put a stop to this exploitation – and end the shady outsourcing by companies like Sama, who lure in young Africans with the prospect of tech jobs, only to throw them away when they dare to seek a better deal.
— Foxglove (@Foxglovelegal) July 12, 2023
Cori Crider, director of Foxglove, argued, “The outsourcing of these workers is a tactic by tech companies to distance themselves from the awful working conditions content moderators endure.”
These workers moderated child sexual abuse content, plus incest, bestiality, rape, sex trafficking and sex slavery.
— Foxglove (@Foxglovelegal) July 25, 2023
Why are human content moderators needed?
Training AI models requires considerable human effort to build and prepare datasets.
When OpenAI and other AI developers build their datasets, they typically collect data from the real world, generate synthetic data, and scrape data from the internet, including images and text from websites, messaging boards, forums, and so on.
Once collected, the data has to be pre-processed, including removing harmful, hateful, and discriminatory content. Moreover, human teams fine-tune iterations of AI models by inputting potentially risky or harmful prompts and analyzing the responses.
These processes enable researchers to “align” the AI with ethical and social values, obtaining a clean, neutral AI that isn’t susceptible to volatile behavior. Or at least, that’s the ideal for proprietary public models like ChatGPT and Bard.
AI alignment is a highly imperfect science that cannot be achieved without layers of human input.
While other AI tools can pre-filter data, removing more overt instances of hateful or toxic content, their accuracy is far from assured, and some will inevitably slip through the net. The task is further complicated by the human ingenuity of constantly inventing ways to subvert AI content filtering, for example, by replacing words with emojis, a technique regularly used to bypass filters on social media.
In this particular scenario, OpenAI confirmed to TIME that Sama employees in Kenya were helping build a tool designed to detect harmful content, which was eventually built into ChatGPT.
OpenAI responds to the petition
In mid-July, OpenAI responded to the concerns about the psychological impact of content moderation work.
In a statement to ITWeb Africa, an OpenAI spokesperson said, “We recognise this is challenging work for our researchers and annotation workers in Kenya and around the world – their efforts to ensure the safety of AI systems have been immensely valuable.”
The spokesperson continued, “Our mission is to build safe and beneficial AGI (artificial general intelligence), and human data annotation is one of the many streams of our work to collect human feedback and guide the models toward safer behaviour in the real world. We believe this work needs to be done humanely and willingly, which is why we establish and share our own ethical and wellness standards for our data annotators.”
Martha Dark, the director of Foxglove, said, “ChatGPT is world-famous as a symbol of AI’s potential. But like Facebook before it, its success is built on an army of hidden and underpaid people who do the gruesome work of sifting through toxic content to make the service safe. Moderators in Kenya are forming the first content moderators’ union on the continent to fight back. This parliamentary petition is the latest demonstration of the power of organised tech workers. Foxglove supports this movement – and hopes Kenyan MPs will make urgent reforms to the outsourcing model that allows companies like Sama to enable exploitation by foreign tech giants.”
Mercy Mutemi, managing partner of the Kenyan law firm Nzili & Sumbi Advocates, added, “Kenyans have had enough of being big tech’s cash cow, where huge profits are extracted then sent overseas, leaving the young African workers, who made them, jobless and broken. I urge lawmakers to listen to these brave former ChatGPT data labellers and immediately investigate working conditions inside Kenya’s content moderation offices.”
In the separate case involving Meta, a Kenyan court ruled that Meta was responsible for the workers, not Sama, serving as a landmark decision that could change the nature of tech outsourcing.
Content moderation’s dark history
Content moderation has a grim history that dates back to the early days of the internet.
The modern internet is highly censored, and harmful content of various kinds is largely banned from mainstream websites. But people still try, and the burden of protecting online communities often falls on human shoulders.
The sheer volume of potentially offensive content is staggering. As reported by Forbes in 2020, Facebook’s content analysis and moderation AI flagged over 3 million pieces of content daily that possibly violated their community standards.
Facebook then employed about 15,000 content moderators who sift through thousands of pieces of content every day. Mark Zuckerberg admitted that approximately 1 out of 10 pieces of content escapes the net and goes live on Facebook or Instagram.
Despite advancements in automated content filtering, a 2021 study by researchers at Virginia Tech, St. Mary’s University, Texas, and the University of Texas at Austin estimated there are some 100,000 content moderators working worldwide.
The authors note that human interpretation is often necessary due to high accuracy requirements, the subjective nature of the task, and complex, ever-changing moderation policies.
Academic literature documents moderators developing forms of posttraumatic stress disorder (PTSD), stress, depression, and anxiety, among other psychiatric complications. Paid content moderators grapple with disturbing content while maintaining strict quotas for acceptable job performance and are often paid low wages.
Some content moderators are subject to extremely distressing content while working on what’s become known as the “terror queue” – the moderation queue containing the most disturbing content, including murder, suicide, and torture.
In The Verge’s 2019 exposé of content moderation for Google and YouTube, a moderator working with Alphabet in Austin, Texas, said, “If I said it didn’t affect me, it’s a complete lie.” “What you see every day … it shapes you,” continuing, “At the beginning, you’d see everybody saying, ‘Hi, how are you?’” “Everybody was friendly. They’d go around checking in. Now nobody is even wanting to talk to the others.”
Another said, “Every day you watch someone beheading someone, or someone shooting his girlfriend. After that, you feel like wow, this world is really crazy. This makes you feel ill. You’re feeling there is nothing worth living for. Why are we doing this to each other?”
While AI content filters are improving, lessening the burden on human content moderation teams, human oversight remains pivotal for capturing content that averts their gaze.
Ultimately, when it comes to building AI training datasets, some level of human exposure to harmful content is largely unavoidable.
If Kenyan courts rule in favor of the content moderators and other outsourced contractors follow their lead, AI companies won’t have a choice other than to fork out fair compensation for this grueling task.
After all, the performance of their models depends on it.