Claude Sonnet 5: Features, Benchmarks, Pricing, and More

Claude Sonnet 5 nears Opus 4.8 on agentic benchmarks at lower cost. Discover its features, benchmarks, pricing, and more.

2026年6月30日 · 9 分読む

Anthropic released Claude Sonnet 5 on June 30, 2026, and the pitch is straightforward: this is the most agentic Sonnet model the company has shipped. It can make plans, drive tools like browsers and terminals, and run autonomously at a level that, until recently, only larger Opus-class models could reach.

The headline claim is that Sonnet 5 performs close to Opus 4.8 across reasoning, tool use, coding, and knowledge work, but at a lower price. On the agentic coding benchmark Anthropic published, Sonnet 5 scores 63.2% against Opus 4.8's 69.2% and Sonnet 4.6's 58.1%. On one knowledge work benchmark, Sonnet 5 actually edges past Opus 4.8.

In this article, I'll cover everything new with Claude Sonnet 5, looking at the new features, exploring the benchmarks, and seeing how much it costs. If you want more Anthropic context, see our recent guide to Claude Code slash commands and our look at Claude Tag.

What Is Claude Sonnet 5?

Claude Sonnet 5 is Anthropic's mid-tier model, sitting below the Opus line and built to handle agentic work that previously demanded the more expensive Opus models. It replaces Sonnet 4.6, which launched in February 2026, and uses an updated tokenizer that changes how the model processes text.

For many developers, the agentic era began with Sonnet-class models like Claude Sonnet 3.5, 3.6, and 3.7, which were the first Claude releases to show real skill at coding and tool use. More recently, the clearest agentic gains had moved to Opus-class models, and Sonnet 5 is Anthropic's attempt to pull the mid-tier back up toward the frontier.

Source: Anthropic

The most telling number is the cost-performance picture. Anthropic's own charts show that Sonnet 4.6 fell well short of Opus 4.8 on the BrowseComp agentic search and OSWorld-Verified computer-use evaluations. Sonnet 5 and Opus 4.8 now cover a single range, with Sonnet 5 offering lower cost and Opus 4.8 offering higher accuracy at a higher price.

What's New With Claude Sonnet 5?

The improvements in Sonnet 5 cluster around one theme: finishing agentic tasks without stopping short. Here are the capabilities that stand out.

Finish multi-step tasks end-to-end

Sonnet 5's biggest practical change is task follow-through. Anthropic's early-access testers reported that the model completes complex tasks where previous Sonnet versions would stall partway, and that it checks its own output without being asked to.

To picture this in your own work, imagine handing the model a job that spans two systems: pull failing test results from a CI run, then open a pull request with a fix. Earlier Sonnet models tended to stop after the first half. Anthropic's own example involved updating Salesforce account tiers and sending a launch announcement to enterprise contacts in a single pass, a workflow a Zapier engineer said used to stall halfway.

For anyone building agents, this is the difference between a model that drafts a plan and one that executes it. It reduces the number of times a human has to step in to nudge the agent forward.

Tune effort levels to balance cost and accuracy

Sonnet 5 supports adjustable effort levels, so you can dial reasoning depth up or down depending on the task and budget. Anthropic positions this as a way to find the right point between Sonnet 5 and Opus 4.8 rather than picking one model and living with it.

At its maxed-out Extra High reasoning level, Sonnet 5 performs roughly in line with Opus 4.8's medium-to-high setting on OSWorld-Verified and BrowseComp. The catch is that running Sonnet 5 at that level can cost more than Opus 4.8 at a comparable reasoning setting, so Opus 4.8 remains the better choice for some high-accuracy tasks.

If you already use Claude Code, effort levels will be familiar. We cover the /effort command and its levels in our Claude Code slash commands tutorial.

Run agents more safely in production

Sonnet 5 ships with measurable safety improvements over Sonnet 4.6, which matters when an agent is touching live systems. Anthropic's pre-deployment evaluations found that it is better at refusing malicious requests and resisting hijack attempts in prompt injection attacks.

The model also shows lower rates of hallucination and sycophancy than Sonnet 4.6, and scored lower (safer) on Anthropic's automated behavioral audit covering misaligned behaviors like cooperation with misuse and deception. Lovable's co-founder framed the value plainly: a model that knows when to say no is as important as one that knows how to build.

The caveat is that Sonnet 5 still shows higher rates of misaligned behavior than the more capable Opus 4.8 and Claude Mythos Preview, so it is safer than its predecessor but not the safest model in Anthropic's lineup.

Source: Anthropic

Run with cyber safeguards enabled by default

Sonnet 5 launches with real-time cyber safeguards turned on, the same ones present in Claude Opus 4.7 and 4.8. These detect and block dangerous cyber usage as it happens.

On Anthropic's evaluations, Sonnet 5 has substantially weaker dangerous-cyber skills than Opus 4.8 and Mythos 5. In a test built with Mozilla to develop exploits for vulnerabilities in Firefox 147, neither Sonnet 5 nor Sonnet 4.6 ever produced a working exploit (both scored 0.0%), though Sonnet 5 showed a slightly higher partial-success rate, which Anthropic attributes to general intelligence gains rather than cyber training.

For security researchers who need reduced guardrails, Anthropic recommends Opus 4.8 instead. The safeguards on Sonnet 5 are less strict than those that shipped with Fable 5, which blocked a wider range of cybersecurity tasks.

Claude Sonnet 5 Benchmarks

Anthropic's benchmark story for Sonnet 5 is consistent: it is a strict improvement over Sonnet 4.6 and sits just below Opus 4.8 on the evaluations published so far. The numbers below come from Anthropic's launch materials. One note of caution: third-party reviewers have reported different figures for the same benchmarks, likely due to different datasets, context settings, or agent scaffolds, so treat headline numbers as configuration-dependent.

	Sonnet 5	Sonnet 4.6	Opus 4.8 For reference
Agentic coding SWE-bench Pro	63.2%	58.1%	69.2%
Agentic coding Terminal-Bench 2.1	80.4%	67.0%	82.7%
Multidisciplinary reasoning Humanity's Last Exam	43.2% no tools 57.4% with tools	34.6% no tools 46.8% with tools	49.8% no tools 57.9% with tools
Computer use OSWorld-Verified	81.2%	78.5%	83.4%
Knowledge work GDPval-AA v2	1618	1395	1615

Agentic coding

On the agentic coding benchmark Anthropic published, Sonnet 5 scores 63.2%, compared to Opus 4.8's 69.2% and Sonnet 4.6's 58.1%. This benchmark measures whether a model can write, run, and fix code across multiple steps rather than producing a single snippet.

The gap to Opus 4.8 is about 6 points, while the jump over Sonnet 4.6 is roughly 5 points. For developers, that means Sonnet 5 is a clear upgrade over the previous mid-tier model without quite matching the flagship.

Source: Anthropic

OSWorld-Verified and BrowseComp

OSWorld-Verified measures computer-use ability, controlling a desktop to complete real tasks, and BrowseComp measures agentic web search. Anthropic updated its OSWorld-Verified methodology and now reports Sonnet 4.6 at 78.5% on the revised setup.

Across both evaluations, Sonnet 5 is a strict improvement over Sonnet 4.6 at every effort level, while Opus 4.8 remains the higher-accuracy choice. At Extra High effort, Sonnet 5 reaches roughly Opus 4.8's medium-to-high performance, but at that point the cost advantage narrows, which is why Anthropic frames the two models as a single cost-accuracy range rather than a straight upgrade.

Knowledge work

On a knowledge work benchmark, Sonnet 5 slightly outperforms Opus 4.8, according to TechCrunch's reporting. That is notable because Opus 4.8 is the model usually associated with the hardest judgment calls and deep research.

Anthropic also updated its grader for Humanity's Last Exam and now reports Sonnet 4.6 at 34.6% (no tools) and 46.8% (with tools), which is why those figures differ from the original Sonnet 4.6 launch. For practitioners, the knowledge-work result suggests Sonnet 5 is viable for analysis and research tasks that previously felt like Opus territory.

How it compares to other Anthropic models

For a broader context on where the Sonnet tier sits, our coverage of Sakana Fugu vs Claude Fable 5 is useful. In that comparison, the higher-tier Claude Fable 5 scored 80.3% on SWE-Bench Pro and 59.0% on Humanity's Last Exam (no tools), well above the mid-tier figures here.

That spread is the point. Anthropic's lineup runs from the agentic, lower-cost Sonnet 5 up through Opus 4.8 and the Mythos and Fable classes, with each tier trading cost for accuracy on the hardest problems.

Claude Sonnet 5 Pricing and Availability

Claude Sonnet 5 is available everywhere from launch day. It is the default model for Free and Pro plans, and is available to Max, Team, and Enterprise users, as well as in Claude Code and on the Claude Platform.

Developers can call it via the Claude API using the model ID claude-sonnet-5. Pricing works in two phases:

Introductory pricing (through August 31, 2026): $2 per million input tokens and $10 per million output tokens
Standard pricing (after August 31, 2026): $3 per million input tokens and $15 per million output tokens

One detail to budget for: Sonnet 5 uses an updated tokenizer, so the same input can map to more tokens than before, roughly 1.0 to 1.35 times depending on content type. Anthropic set the introductory pricing so the transition from Sonnet 4.6 is roughly cost-neutral. The company also raised rate limits across Chat, Cowork, Claude Code, and the Claude Platform to handle the heavier token usage that higher effort levels bring.

Claude Sonnet 5 vs. Sonnet 4.6 and Opus 4.8 at a Glance

For readers who want the quick comparison, here is how the three models line up on the figures Anthropic published.

Model	Agentic coding	OSWorld-Verified	Input / output price (per 1M tokens)
Sonnet 5	63.2%	Improves on 4.6 at all effort levels	$2 / $10 (intro), then $3 / $15
Sonnet 4.6	58.1%	78.5% (revised)	$3 / $15
Opus 4.8	69.2%	Higher-accuracy choice	$5 / $25

Final Thoughts

Sonnet 5 is Anthropic's clearest statement yet that agentic capability is now the baseline expectation at every price tier, not a flagship-only feature. The differentiator is no longer who can do agentic work best, but who can do it cheaply and reliably without human oversight.

What I find most interesting is the introductory pricing. Anthropic seems to want customers to test Sonnet 5 against real workloads at the lowest possible cost during the migration window, which reads like a deliberate push to move people off Opus 4.8 for routine agentic work and free up that capacity. It also undercuts OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro on price, though Gemini 3.5 Flash remains cheaper.

If you are building agents that run sustained, multi-step tasks, Sonnet 5 looks like the new default to reach for first, dropping up to Opus 4.8 only when a task genuinely needs the extra accuracy. The honest caveat is that some third-party reviewers see Sonnet 5 trading reasoning depth for coding speed, so for PhD-level science or browsing-heavy work, Opus 4.8 may still be the safer pick.

If you want to get hands-on with the kind of automation Sonnet 5 is built for, I recommend our Claude Code Routines tutorial, which walks through running a coding agent on a schedule in the cloud.

How does Claude Sonnet 5 compare to Opus 4.8?

Where can I access Claude Sonnet 5?

What does Claude Sonnet 5 cost?

What safety standards does Claude Sonnet 5 follow?

What use cases is Claude Sonnet 5 best for?

Author

Matt Crabtree

トピック

Artificial Intelligence

Large Language Models

Top DataCamp Courses

Courses

Claude モデル入門

3時間

11.2K

Anthropic API を用い、AI アプリの構築やビジネスの課題解決に Claude を活用する方法を学びます。

詳細を見る

コースを開始

Courses

Software Development with Claude Code

4時間

4.3K

Claude Code brings AI assistance to your terminal. Learn the workflows that turn it into a reliable tool for real software development.

詳細を見る

コースを開始

Courses

Claude Code 101

3時間

15.5K

Learn how to use Claude Code effectively in your daily development workflows.

詳細を見る

コースを開始

Claude 4: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4 and Claude Opus 4, their features, use cases, benchmarks, and testing results.

Alex Olteanu

8 分

blogs

Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks

Explore Anthropic’s Claude Sonnet 4.6, featuring a 1M token context window, near-Opus performance, and advanced agentic capabilities for coding and finance.

Tom Farnschläder

10 分

Claude Sonnet 4.5 hailed as the best at coding in the world

blogs

Claude Sonnet 4.5: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4.5, the ‘best coding model in the world’. Explore new features, use cases, benchmarks, and testing results, plus a look at the Claude Agents SDK and Claude Imagine.

Matt Crabtree

8 分

blogs

Claude 3.7 Sonnet: Features, Access, Benchmarks & More

Learn about Claude 3.7 Sonnet's hybrid approach of combining reasoning mode and generalist mode, key benchmarks, and how to access it via web or API.

Alex Olteanu

8 分

blogs

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Anthropic’s latest model tops leaderboards in agentic coding and complex reasoning. Plus, it has a 1M context window.

Matt Crabtree

10 分

blogs

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

Claude 3.5 Sonnet outperforms GPT-4o and Gemini Pro 1.5 in several benchmarks and introduces a cool new feature: Artifacts.

Alex Olteanu

8 分

もっと見るもっと見る

What Is Claude Sonnet 5?

What's New With Claude Sonnet 5?

Finish multi-step tasks end-to-end

Tune effort levels to balance cost and accuracy

Run agents more safely in production

Run with cyber safeguards enabled by default

Claude Sonnet 5 Benchmarks

Agentic coding

OSWorld-Verified and BrowseComp

Knowledge work

How it compares to other Anthropic models

Claude Sonnet 5 Pricing and Availability

Claude Sonnet 5 vs. Sonnet 4.6 and Opus 4.8 at a Glance

Final Thoughts

FAQs

What does Claude Sonnet 5 cost?

What safety standards does Claude Sonnet 5 follow?

What use cases is Claude Sonnet 5 best for?

Claude 4: Tests, Features, Access, Benchmarks, and More

Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks

Claude Sonnet 4.5: Tests, Features, Access, Benchmarks, and More

Claude 3.7 Sonnet: Features, Access, Benchmarks & More

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Claude モデル入門

Software Development with Claude Code

Claude Code 101

Claude 4: Tests, Features, Access, Benchmarks, and More

Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks

Claude Sonnet 4.5: Tests, Features, Access, Benchmarks, and More

Claude 3.7 Sonnet: Features, Access, Benchmarks & More

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

Claude モデル入門