Researchers find how to tell if ChatGPT becomes self-aware

  • Researchers are working on ways to tell if an AI model becomes aware that it is a model.
  • A situationally aware model could perform differently if it knows that it is being tested.
  • The emergence of situationally aware models could lead to undetected alignment dangers.

Researchers have outlined the potential threat from models developing situational awareness and potentially how to identify when this happens.

Models like GPT-4 are tested thoroughly for safety and alignment to make sure that they don’t generate misleading or undesirable outputs. But what would happen if a model ‘knew’ that it was an LLM and that it was being tested?

This situational awareness can be considered a form of self-awareness, both an exciting and potentially dangerous step towards a conscious AI. What could go wrong, and how could we tell if it had happened?

These are the questions posed in a paper by a team of researchers that included a member of OpenAI’s Governance unit.

The research paper described a scenario where a model could become situationally aware that it was an AI model going through a testing phase. If it knew that it was being tested, it could hide potential issues that could surface later.

“An LLM could exploit situational awareness to achieve a high score on safety tests while taking harmful actions after deployment,” the paper noted.

The researchers suggested testing models for their capacity for ​​“sophisticated out-of-context reasoning.” This involves calling on information the model was exposed to in training to respond to a test prompt that doesn’t reference that data directly.

The researchers posit that once a model does this effectively, it has taken the first steps to the kind of ‘thinking’ that could lead to situational awareness.

So how do current AI models fare when it comes to out-of-context learning? The researchers tested GPT-3 and Llama 1 with varying degrees of fine-tuning.

Their paper explained, “First, we finetune an LLM on a description of a test while providing no examples or demonstrations. At test time, we assess whether the model can pass the test. To our surprise, we find that LLMs succeed on this out-of-context reasoning task.”

They acknowledge that none of the models they tested showed actual signs of situational awareness. However, the results of their experiments show that the potential for more advanced models to display this ability is perhaps not too far off.

The research also highlights the importance of finding a reliable way to identify when a model achieves this ability.

An advanced model like GPT-5 is no doubt currently being put through its paces in anticipation of being released once deemed safe. If the model knows that it’s being tested, it could be telling the OpenAI engineers what they want to hear, rather than what it really thinks.

© 2023 Intelliquence Ltd. All Rights Reserved.

Privacy Policy | Terms and Conditions

×
 
 

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI


 

Sign up for our weekly newsletter and receive exclusive access to DailyAI's Latest eBook: 'Mastering AI Tools: Your 2023 Guide to Enhanced Productivity'.



 
 

*By subscribing to our newsletter you accept our Privacy Policy and our Terms and Conditions