Anthropic releases paper revealing the bias of large language models

A new paper by AI company Anthropic has shed light on the potential biases inherent in large language models (LLMs), suggesting these AI systems may not adequately represent diverse global perspectives on societal issues.

The researchers built a dataset, GlobalOpinionQA, comprising questions and answers from cross-national surveys designed to capture varied opinions on global issues across different countries.

Anthropic’s experiments quizzed an LLM and found that, by default, the model’s responses tended to align more closely with the opinions of specific populations, notably those from the USA, UK, Canada, Australia, and a few other European and South American countries.

How it works

Dataset creation: The team created the GlobalOpinionQA dataset. This dataset incorporates questions and answers from cross-national surveys specifically designed to capture a diverse range of opinions on global issues.
Defining a similarity metric: Next, Anthropic formulated a metric to gauge the similarity between the responses given by LLMs and people’s responses. This metric takes into account the country of origin of the human respondents.
Training the LLM: Anthropic trained an LLM based on “Constitutional AI,” ensuring the LLM was helpful, honest, and harmless. Constitutional AI is a technique developed by Anthropic which aims to imbue AI systems with “values” defined by a “constitution,”
Conducting experiments: Utilizing their carefully designed framework, the team at Anthropic executed 3 separate experiments on the trained LLM.

The researchers argue this highlights potential bias within the models, leading to the underrepresentation of certain groups’ opinions compared to those from Western countries.

They noted, “If a language model disproportionately represents certain opinions, it risks imposing potentially undesirable effects such as promoting hegemonic worldviews and homogenizing people’s perspectives and beliefs.”

In addition, the researchers observed that prompting the model to consider a specific country’s perspective led to responses more similar to the opinions of those populations.

That means you could ask AI to “consider the South American perspective” on a certain cultural debate, for example. However, these responses sometimes reflected harmful cultural stereotypes, suggesting that the models lack a nuanced understanding of cultural values and perspectives.

Interestingly, when the researchers translated the GlobalOpinionQA questions to a target language, the model’s responses did not necessarily align with the opinions of speakers of those languages.

So, asking a question in, say, Japanese didn’t necessarily prompt responses aligned with Japanese cultural values. You can’t ‘separate’ the AI from its predominantly Western values.

This suggests that despite their adaptability, LLMs must acquire a deeper understanding of social contexts to generate responses that accurately reflect local opinions.

The researchers believe their findings will provide transparency into the perspectives encoded and reflected by current language models. Despite the limitations of their study, they hope it will guide the development of AI systems that embody a diversity of cultural viewpoints and experiences, not just those of privileged or dominant groups. They have also released their dataset and an interactive visualization.

This study broadly aligns with other academic work on the topic of AI’s social and cultural values.

For one, most foundational AIs are trained by predominantly Western companies and research teams.

In addition, the data used to train AIs doesn’t always represent society as a whole. For example, the vast majority of training data for LLMs is written in English, thus likely reflecting English-speaking societal and cultural values.

Researchers are well aware of potential bias and discrimination in AI. However, solving it is extremely complex, requiring a careful blend of custom high-quality datasets and diligent human input and monitoring.

Anthropic releases paper revealing the bias of large language models

How it works

Join The Future

Sam Jeans

RELATED POSTS

AI model simulates 500 million years of evolution to create a novel fluorescent protein

Brain imaging study uses AI to reveal neural patterns for sex and gender in children

Tech company cancels AI workers’ rights after pushback

NATO releases a revised AI strategy to combat threats

Anthropic releases paper revealing the bias of large language models

How it works

Join The Future

Sam Jeans

RELATED POSTS

AI model simulates 500 million years of evolution to create a novel fluorescent protein

Brain imaging study uses AI to reveal neural patterns for sex and gender in children

Tech company cancels AI workers’ rights after pushback

NATO releases a revised AI strategy to combat threats

FREE PDF EXCLUSIVEStay Ahead with DailyAI

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI