Meta, UCSD introduce ToolVerifier to improve LLM tool calls

Researchers from Meta and the University of California San Diego (UCSD) developed ToolVerifier, a method that improves how LLMs call and interact with software tools.

For LLMs to become useful as general assistants or agents, they need to be taught how to use various tools or APIs. Fine-tuning an LLM to use a specific tool does work, but the real challenge is for an LLM to interact with new tools without the need for fine-tuning or few-shot demonstrations.

When two tools are very similar, it can be especially challenging for the LLM to choose the correct one to accomplish its goal. The current method of providing several few-shot examples for each tool can consume a lot of the context window available to an LLM too.

ToolVerifier is a self-verification method that enables the LLM to ask itself questions so it can work out which tool to use and what parameters to pass to the tool.

To help the LLM, ToolVerifier first selects the most suitable tool from a library of options and then generates the appropriate parameters. At each of these steps, it generates questions to help evaluate its choices and discriminate between similar candidate tools.

Here’s an example from the research paper showing the process of tool selection and parameter clarification.

ToolVerifier first identifies the top two tools and generates a verification question. The answer to the question leads to the final tool choice. A similar method is used to generate parameters. Source: arXiv

ToolVerifier was trained on data consisting of a list of synthetic tools including travel, banking, and calendar tools and their associated descriptions. It was trained to select the appropriate tool based purely on the title and description.

Once trained on tool selection and parameter verification the researchers tested ToolVerifier with 4 tasks from the ToolBench benchmark that required Llama 2-70B to interact with 17 previously unseen tools.

The results published in the paper say that using the ToolVerifier method resulted in “an average improvement of 22% over few-shot baselines, even in scenarios where the distinctions between candidate tools are finely nuanced.”

Percentage (%) success rate for Weather, Booking, Home, and Cat tasks from the Toolbench benchmark comparing models with and without ToolVerifier. Source: arXiv

The results show that ToolVerifier delivers a substantial improvement in an LLM’s tool selection and accurate parameter generation. The method was only trained and tested for single-tool rather than multi-tool interactions, but it’s promising nonetheless.

Tool-augmented LLMs are an exciting development in using AI as a generalized agent. Once LLMs learn to use multiple tools to achieve a goal, they will be even more useful to us than they already are.

The future where an AI assistant books a flight, coordinates a meeting, or does your grocery shopping for you, doesn’t seem very far off.

Meta, UCSD introduce ToolVerifier to improve LLM tool calls

Join The Future

Eugene van der Watt

RELATED POSTS

OpenAI announces “SearchGPT” to try and stay at the front of the pack

Meta releases Llama 3.1 models, sticks with open strategy

Senate probes OpenAI’s safety and governance after whistleblower claims

Google’s AI predicts weather using fraction of computing power

Meta, UCSD introduce ToolVerifier to improve LLM tool calls

Join The Future

Eugene van der Watt

RELATED POSTS

OpenAI announces “SearchGPT” to try and stay at the front of the pack

Meta releases Llama 3.1 models, sticks with open strategy

Senate probes OpenAI’s safety and governance after whistleblower claims

Google’s AI predicts weather using fraction of computing power

FREE PDF EXCLUSIVEStay Ahead with DailyAI

FREE PDF EXCLUSIVE
Stay Ahead with DailyAI