Claude.ai is Anthropic’s answer to ChatGPT and the AI model that powers it just got a major upgrade.
Claude 2.1 delivers a significant improvement in performance compared to Claude 2.0. Probably the biggest upgrade is the doubling of its context window, but there are some other impressive features too.
Anthropic’s blog post goes into a lot more detail but here is a simplified summary of the upgrades.
The context window of a model determines how many tokens it can keep in its memory. Claude 2.1 now has a 200,000 token context window, double that of Claude 2.0 and the largest in the industry. For comparison, GPT-4 Turbo has a 128k context window.
This means you could drop around 150,000 words, or 500 pages into a chat with Claude and ask questions related to the material. Well, in theory.
Greg Kamradt did a stress test of Claude 2.1’s recall capability and it struggled with accuracy as the context grew longer and when the fact to recall was somewhere in the middle of the document.
Claude 2.1 (200K Tokens) – Pressure Testing Long Context Recall
We all love increasing context lengths – but what’s performance like?
Anthropic reached out with early access to Claude 2.1 so I repeated the “needle in a haystack” analysis I did on GPT-4
Here’s what I found:… pic.twitter.com/B36KnjtJmE
— Greg Kamradt (@GregKamradt) November 21, 2023
It’s still pretty impressive despite some recall accuracy issues.
More accuracy and honesty
Claude 2.1 makes fewer mistakes and will lie to you less often. It’s 30% more likely to give a correct answer than before.
Like other AI models, it will still hallucinate but it does so around half as much as Claude 2.0 does.
Claude 2.1 got an upgrade in its intellectual humility too. It is also almost twice as likely to decline to answer a question when it doesn’t know the answer rather than make something up.
API tool use
Claude can now interact with a user’s databases, search over web resources for an answer, or interact with other tools via APIs.
Anthropic says a user can now define a set of tools, ask a question, and then Claude will decide which tools to use to answer the question.
Claude already integrates with Zapier so this added ability to translate natural language into API or function calls could be huge. Could we see Anthropic’s version of OpenAI’s GPTs soon?
The tool use feature is in beta so we’ll have to wait to see what it’s capable of.
This feature allows an API call to give Claude context and instructions on how to respond before a human prompt is entered.
This means you can have Claude assume a certain character or voice and tell it things it should or shouldn’t do when interacting with the user. Subsequent interactions with Claude will then have the chat responses stay in character for longer.