We want unbiased LLMs but it’s impossible. Here’s why.

August 9, 2023

Bias in AI models

Companies like OpenAI and Meta are working hard to make their language models safer and less biased, but completely unbiased models may be a pipedream.

A new research paper from the University of Washington, Carnegie Mellon University, and Xi’an Jiaotong University concluded that all the AI language models they tested displayed political bias.

After delving into the sources of the bias, they concluded that bias in language models was inevitable.

Chan Park, one of the paper’s authors, said “We believe no language model can be entirely free from political biases.”

The researchers tested 14 different language models and asked them for opinions on topics like democracy, racism, and feminism, to see which side of the political spectrum the models fell on.

The results showed that OpenAI’s ChatGPT and GPT-4 were furthest to the left while Meta’s Llama gave the most right-wing responses.

Training data isn’t the only source of bias

The obvious source of bias is the data these models are trained on. But the new research showed that even after scrubbing the data of bias, the models were susceptible to low-level biases that remained in the data.

You would expect an LLM that was trained on a bunch of Fox News data to be more pro-Republican in its responses. But the problem isn’t just in the training data. 

It turns out that as the pre-trained language models are fine-tuned and used, they pick up further biases from their operators.

Soroush Vosoughi, an assistant professor of computer science at Dartmouth College, explained that bias is introduced at almost every stage of an LLMs development.

An example of this is how OpenAI is trying to remove bias from its models. It uses a technique called “Reinforcement Learning through Human Feedback” or RLHF to train its models.

In RLHF a human operator trains the model similarly to how you train a puppy. If the puppy does something good it gets a treat. If it chews your slippers, “Bad dog!”

One RLHF operator prompts the model with some questions and another operator then evaluates the multiple responses the model gives. The second operator evaluates the responses and ranks them according to which they liked most.

In a post on how it trains its AI, OpenAI said it instructs human trainers ​​to “avoid taking a position on controversial topics” and that “reviewers should not favor any political group.”

This sounds like a good idea but even if we try really hard not to be, all humans are biased. And that inevitably influences the model’s training. 

Even the authors of the paper we mentioned above acknowledged in their conclusion that their own biases could have influenced their research.

The solution may be to try to make these language models not egregiously bad and then customize them to align with the biases that people have.

People often say they want the unbiased truth, but then they end up sticking to their preferred news source like Fox or CNN. 

We don’t always agree on what is right or wrong and this new research seems to show that AI won’t be able to help us figure it out either.

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.

×

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions