Anthropic releases Claude Sonnet 3.5 which beats GPT-4o

June 24, 2024

  • Anthropic released Claude Sonnet 3.5 which is now its most powerful model, beating Claude 3 Opus
  • Claude Sonnet 3.5 offers upgraded vision and coding abilities and an Artifacts preview window
  • Claude Sonnet 3.5 beats GPT-4o and Gemini 1.5 Pro on multiple MMLU benchmark tests

Anthropic released Claude Sonnet 3.5 which is more powerful, faster, and cheaper than its larger Claude 3 Opus model.

When Anthropic released its Claude 3 family of models in March, they came in three variations, Haiku, Sonnet, and Opus, each increasing in size, capability, and token costs.

Claude Sonnet 3.5 is significantly more intelligent than its larger predecessor and comes with a big upgrade in its vision processing and coding capabilities.

It’s also a lot faster and cheaper too. Anthropic says that inference with Claude Sonnet 3.5 is twice as fast as Claude Opus 3, 5 times cheaper per token, and also has a 200k context window.

Within just 3 months, Claude Opus 3 has become redundant and Anthropic says we can expect upgraded 3.5 versions of Haiku and Opus “soon”.

Anthropic has made the model available to use for free on its chat interface and iOS app. Signing up for a paid account gives you higher rate limits and API access.

Claude Sonnet 3.5 benchmark results

Claude Sonnet 3.5 can’t search the internet or generate images but its upgraded vision processing, math, reasoning, and coding abilities beat industry leaders GPT-4o and Gemini Pro 1.5 on a range of benchmarks.

Claude Sonnet 3.5 benchmark comparison. Source: Anthropic
Claude Sonnet 3.5 benchmark comparison. Source: Anthropic

The visual math reasoning and coding scores are the standout figures here and it’s the improved coding skills that have got users particularly excited.


The Artifacts feature is an exciting addition to Claude’s web chat interface. ChatGPT will generate code for you, but then you have to copy and paste it into a development environment to try it out.

Claude now has an additional window that opens up next to the chat interface where you can see a real-time preview of the code. Edits are immediately reflected in the Artifacts window.

Anthropic says that Artifacts will soon support teams and allow for collaborative work on projects. Let’s hope ChatGPT gets its own version of Artifacts soon.

Anthropic said it subjected Claude 3.5 Sonnet to rigorous safety tests and also gave it to the UK’s Artificial Intelligence Safety Institute (UK AISI) for pre-deployment safety evaluation.

Its internal safety evaluation, published in the model card, classified “Claude 3.5 Sonnet as an AI Safety Level 2 (ASL-2) model, indicating that it does not pose risk of catastrophic harm.”

Anthropic says that in addition to upgraded versions of the Haiku and Opus models, it will be adding modalities, memory capability, and more enterprise integration features soon.

