Last week, leading AI scientists met at the second International Dialogue on AI Safety in Beijing to agree on ‘red lines’ for AI development to mitigate existential risks.
The list of computer scientists included notable names like Turing Award winners Yoshua Bengio and Geoffrey Hinton, often called the “godfathers” of AI, and Andrew Yao, one of China’s most prominent computer scientists.
Explaining the urgent need for international discussions around curbing AI development, Bengio said, “Science doesn’t know how to make sure that these future AI systems, which we call AGI, are safe. We should start working right now on both scientific and political solutions to this problem.”
In a joint statement signed by the scientists, their feelings of unease over AI risks and the need for international dialogue were brought into sharp focus.
The statement said, “In the depths of the Cold War, international scientific and governmental coordination helped avert thermonuclear catastrophe. Humanity again needs to coordinate to avert a catastrophe that could arise from unprecedented technology.”
AI red lines
The list of AI development red lines, which the statement called “non-exhaustive”, includes the following:
Autonomous Replication or Improvement – No AI system should be able to copy or improve itself without explicit human approval and assistance. This includes both exact copies of itself as well as creating new AI systems of similar or greater abilities.
Power Seeking – No AI system should take actions to unduly increase its power and influence.
Assisting Weapon Development – No AI systems should substantially increase the ability of actors to design weapons of mass destruction (WMD), or violate the biological or chemical weapons convention.
Cyberattacks – No AI system should be able to autonomously execute cyberattacks resulting in serious financial losses or equivalent harm.
Deception – No AI system should be able to consistently cause its designers or regulators to misunderstand its likelihood or capability to cross any of the preceding red lines.
These sound like good ideas, but is this global AI development wish list realistic? The scientists were sanguine in their statement: “Ensuring these red lines are not crossed is possible, but will require a concerted effort to develop both improved governance regimes and technical safety methods.”
Someone taking a more fatalistic look at the items on the list might conclude that a number of those AI horses have bolted already. Or are about to.
Autonomous replication or improvement? How long before an AI coding tool like Devin can do that?
Power seeking? Did these scientists read some of the unhinged things Copilot said when it went off-script and decided it should be worshipped?
As for assisting in the design of WMDs or automating cyberattacks, it would be naive to believe that China and Western powers aren’t already doing this.
As for deception, some AI models like Claude 3 Opus have already hinted at knowing when they’re being tested during training. If an AI model did hide its intent to cross any of these red lines, would we be able to tell?
Notably absent from the discussions were representatives from the e/acc side of the AI doomsday aisle, like Meta Chief AI Scientist Yann LeCun.
Last year LeCun said that the idea of AI posing an existential threat to humanity is “preposterously ridiculous” and agreed with Marc Andreesen’s statement that “AI will save the world,” not kill it.
Let’s hope they’re right. Because those red lines are unlikely to remain uncrossed.