Google has published the first version of its Frontier Safety Framework, a set of protocols that aim to address severe risks that powerful frontier AI models of the future might present.
The framework defines Critical Capability Levels (CCLs), which are thresholds at which models may pose heightened risk without additional mitigation.
It then lays out different levels of mitigations to address models that breach these CCLs. The mitigations fall into two main categories:
- Security mitigations – Preventing exposure of the weights of a model that reaches CCLs
- Deployment mitigations – Preventing misuse of a deployed model that reaches CCLs
The release of Google’s framework comes in the same week that OpenAI’s superalignment safety teams fell apart.
Google seems to be taking potential AI risks seriously and said, “Our preliminary analyses of the Autonomy, Biosecurity, Cybersecurity and Machine Learning R&D domains. Our initial research indicates that powerful capabilities of future models seem most likely to pose risks in these domains.”
The CCLs the framework addresses are:
- Autonomy – A model that can expand its capabilities by “autonomously acquiring resources and using them to run and sustain additional copies of itself on hardware it rents.”
- Biosecurity – A model capable of significantly enabling an expert or non-expert to develop known or novel biothreats.
- Cybersecurity – A model capable of fully automating cyberattacks or enabling an amateur to carry out sophisticated and severe attacks.
- Machine Learning R&D – A model that could significantly accelerate or automate AI research at a cutting-edge lab.
The autonomy CCL is particularly concerning. We’ve all seen the Sci-Fi movies where AI takes over, but now it’s Google saying that future work is needed to protect “against the risk of systems acting adversarially against humans.”
Google’s approach is to periodically review its models using a set of “early warning evaluations” that flags a model that may be approaching the CCLs.
When a model displays early signs of these critical capabilities the mitigation measures would be applied.
An interesting comment in the framework is that Google says, “A model may reach evaluation thresholds before mitigations at appropriate levels are ready.”
So, a model in development might display critical capabilities that could be misused and Google may not yet have a way to prevent that. In this case, Google says that the development of the model would be put on hold.
We can perhaps take some comfort from the fact that Google seems to be taking AI risks seriously. Are they being overly cautious, or are the potential risks that the framework lists worth worrying about?
Let’s hope we don’t find out too late. Google says, “We aim to have this initial framework implemented by early 2025, which we anticipate should be well before these risks materialize.”
If you’re already concerned about AI risks, reading the framework will only heighten those fears.
The document notes that the framework will “evolve substantially as our understanding of the risks and benefits of frontier models improves,” and that “there is significant room for improvement in understanding the risks posed by models in different domains”