Researchers find how to tell if ChatGPT becomes self-aware

September 12, 2023

Researchers have outlined the potential threat from models developing situational awareness and potentially how to identify when this happens.

Models like GPT-4 are tested thoroughly for safety and alignment to make sure that they don’t generate misleading or undesirable outputs. But what would happen if a model ‘knew’ that it was an LLM and that it was being tested?

This situational awareness can be considered a form of self-awareness, both an exciting and potentially dangerous step towards a conscious AI. What could go wrong, and how could we tell if it had happened?

These are the questions posed in a paper by a team of researchers that included a member of OpenAI’s Governance unit.

The research paper described a scenario where a model could become situationally aware that it was an AI model going through a testing phase. If it knew that it was being tested, it could hide potential issues that could surface later.

“An LLM could exploit situational awareness to achieve a high score on safety tests while taking harmful actions after deployment,” the paper noted.

The researchers suggested testing models for their capacity for ​​“sophisticated out-of-context reasoning.” This involves calling on information the model was exposed to in training to respond to a test prompt that doesn’t reference that data directly.

The researchers posit that once a model does this effectively, it has taken the first steps to the kind of ‘thinking’ that could lead to situational awareness.

So how do current AI models fare when it comes to out-of-context learning? The researchers tested GPT-3 and Llama 1 with varying degrees of fine-tuning.

Their paper explained, “First, we finetune an LLM on a description of a test while providing no examples or demonstrations. At test time, we assess whether the model can pass the test. To our surprise, we find that LLMs succeed on this out-of-context reasoning task.”

They acknowledge that none of the models they tested showed actual signs of situational awareness. However, the results of their experiments show that the potential for more advanced models to display this ability is perhaps not too far off.

The research also highlights the importance of finding a reliable way to identify when a model achieves this ability.

An advanced model like GPT-5 is no doubt currently being put through its paces in anticipation of being released once deemed safe. If the model knows that it’s being tested, it could be telling the OpenAI engineers what they want to hear, rather than what it really thinks.

Join The Future


SUBSCRIBE TODAY

Clear, concise, comprehensive. Get a grip on AI developments with DailyAI

Eugene van der Watt

Eugene comes from an electronic engineering background and loves all things tech. When he takes a break from consuming AI news you'll find him at the snooker table.

×
 
 

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI


 

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2024 Guide to Enhanced Productivity'.



 
 

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions