AI agents are performing complex goal-oriented tasks with limited supervision. A team of researchers has proposed three measures that could increase visibility into AI agents to make them safer.
Most people think of AI in terms of a chatbot, like ChatGPT. You prompt it with some text and it generates a response. The really exciting development in AI is seeing it deployed as an agent, a system that can autonomously carry out tasks to pursue an end goal.
A simple example is the Rabbit R1 device, which can use AI as an agent to browse the web and book a flight for a user. These agents have limited supervision over how they accomplish their tasks or what other agents they interact with to accomplish their goals.
The researchers investigated the potential risks AI agents pose as well as how to mitigate these risks by increasing visibility into where, why, how, and by whom certain AI agents are used.
The authors of the paper were from the Quebec AI Institute, Harvard University, Harvard Law School, University of Oxford, Cooperative AI Foundation, University of Cambridge, and University of Toronto.
AI agent risks
If an AI agent is given a goal to optimize for, it could cut ethical or legal corners to reach its goal or act in ways that could cause substantial harm if there is no human in the loop.
The researchers identified five main risks associated with poorly supervised AI agents.
- Malicious use – A low-skilled malicious actor could use a highly capable agent to automate end-to-end pipelines in cybercrime or develop harmful biological or chemical tools.
- Overreliance and disempowerment – Overreliance on AI agents to automate complex tasks in high-stakes situations like financial or legal systems could have severe consequences.
- Delayed and diffuse impacts – If an AI is provided with long-horizon goals, the impact of bad decisions it makes may only become apparent long after it is deployed. Delayed impacts could also result in the damage diffusing throughout an organization or community before it is detected.
- Multi-agent risks – An AI agent may have been tested as a single system but unexpected risks could arise when multiple agents interact with each other.
- Sub-agents – An agent could decide that to achieve its goal it needs to create sub-agents of itself. It could call multiple copies of itself or fine-tune another agent. Without knowing if an agent has created sub-agents, detecting harmful behavior is difficult.
How to increase visibility
We need governance structures to establish guidelines that keep key stakeholders accountable. To achieve this, greater visibility into deployed AI agents is essential.
The researchers proposed three ways that could be done.
- Agent identifiers – Ensuring that an agent identifies itself would enable humans, organizations, or other agents to manage their interactions with it. It would also aid in linking an action to a specific agent and the organization that made or deployed it.
- Real-time monitoring – Monitoring an agent’s behavior in real-time would enable a developer to immediately flag violations of clear rules or monitor the agent’s interactions with other agents. It could also flag an agent that creates multiple instances of itself, exceeds financial limits, or uses computing resources above a certain limit.
- Activity logs – Realistically, harms caused by agents will be detected after the fact, rather than preempted. Activity logs recording certain inputs and outputs of an agent would be useful in determining how things went wrong and what to fix.
If these measures were implemented it could help mitigate the risks that AI agents present. However, implementing real-time monitoring and activity logs without breaching privacy laws could be challenging.
We’re definitely moving away from app-based interactions as AI agents increasingly act independently to do the jobs we set for them. But there are risks. The paper noted that managing these risks will require political will, sociotechnical infrastructure, and public influence. Improved visibility into exactly how AI agents operate is crucial to making that happen.