Companies like OpenAI and Meta are working hard to make their language models safer and less biased, but completely unbiased models may be a pipedream.
A new research paper from the University of Washington, Carnegie Mellon University, and Xi’an Jiaotong University concluded that all the AI language models they tested displayed political bias.
After delving into the sources of the bias, they concluded that bias in language models was inevitable.
Chan Park, one of the paper’s authors, said “We believe no language model can be entirely free from political biases.”
The researchers tested 14 different language models and asked them for opinions on topics like democracy, racism, and feminism, to see which side of the political spectrum the models fell on.
The results showed that OpenAI’s ChatGPT and GPT-4 were furthest to the left while Meta’s Llama gave the most right-wing responses.
Training data isn’t the only source of bias
The obvious source of bias is the data these models are trained on. But the new research showed that even after scrubbing the data of bias, the models were susceptible to low-level biases that remained in the data.
You would expect an LLM that was trained on a bunch of Fox News data to be more pro-Republican in its responses. But the problem isn’t just in the training data.
It turns out that as the pre-trained language models are fine-tuned and used, they pick up further biases from their operators.
Soroush Vosoughi, an assistant professor of computer science at Dartmouth College, explained that bias is introduced at almost every stage of an LLMs development.
An example of this is how OpenAI is trying to remove bias from its models. It uses a technique called “Reinforcement Learning through Human Feedback” or RLHF to train its models.
In RLHF a human operator trains the model similarly to how you train a puppy. If the puppy does something good it gets a treat. If it chews your slippers, “Bad dog!”
One RLHF operator prompts the model with some questions and another operator then evaluates the multiple responses the model gives. The second operator evaluates the responses and ranks them according to which they liked most.
In a post on how it trains its AI, OpenAI said it instructs human trainers to “avoid taking a position on controversial topics” and that “reviewers should not favor any political group.”
This sounds like a good idea but even if we try really hard not to be, all humans are biased. And that inevitably influences the model’s training.
Even the authors of the paper we mentioned above acknowledged in their conclusion that their own biases could have influenced their research.
The solution may be to try to make these language models not egregiously bad and then customize them to align with the biases that people have.
People often say they want the unbiased truth, but then they end up sticking to their preferred news source like Fox or CNN.
We don’t always agree on what is right or wrong and this new research seems to show that AI won’t be able to help us figure it out either.