メインコンテンツへスキップ

Claude Sonnet 5: Features, Benchmarks, Pricing, and More

Claude Sonnet 5 nears Opus 4.8 on agentic benchmarks at lower cost. Discover its features, benchmarks, pricing, and more.
2026年6月30日  · 9 分 読む

Anthropic released Claude Sonnet 5 on June 30, 2026, and the pitch is straightforward: this is the most agentic Sonnet model the company has shipped. It can make plans, drive tools like browsers and terminals, and run autonomously at a level that, until recently, only larger Opus-class models could reach.

The headline claim is that Sonnet 5 performs close to Opus 4.8 across reasoning, tool use, coding, and knowledge work, but at a lower price. On the agentic coding benchmark Anthropic published, Sonnet 5 scores 63.2% against Opus 4.8's 69.2% and Sonnet 4.6's 58.1%. On one knowledge work benchmark, Sonnet 5 actually edges past Opus 4.8.

In this article, I'll cover everything new with Claude Sonnet 5, looking at the new features, exploring the benchmarks, and seeing how much it costs. If you want more Anthropic context, see our recent guide to Claude Code slash commands and our look at Claude Tag.

What Is Claude Sonnet 5?

Claude Sonnet 5 is Anthropic's mid-tier model, sitting below the Opus line and built to handle agentic work that previously demanded the more expensive Opus models. It replaces Sonnet 4.6, which launched in February 2026, and uses an updated tokenizer that changes how the model processes text.

For many developers, the agentic era began with Sonnet-class models like Claude Sonnet 3.5, 3.6, and 3.7, which were the first Claude releases to show real skill at coding and tool use. More recently, the clearest agentic gains had moved to Opus-class models, and Sonnet 5 is Anthropic's attempt to pull the mid-tier back up toward the frontier.

Anthropic Sonnet 5 announcement

Source: Anthropic

The most telling number is the cost-performance picture. Anthropic's own charts show that Sonnet 4.6 fell well short of Opus 4.8 on the BrowseComp agentic search and OSWorld-Verified computer-use evaluations. Sonnet 5 and Opus 4.8 now cover a single range, with Sonnet 5 offering lower cost and Opus 4.8 offering higher accuracy at a higher price.

What's New With Claude Sonnet 5?

The improvements in Sonnet 5 cluster around one theme: finishing agentic tasks without stopping short. Here are the capabilities that stand out.

Finish multi-step tasks end-to-end

Sonnet 5's biggest practical change is task follow-through. Anthropic's early-access testers reported that the model completes complex tasks where previous Sonnet versions would stall partway, and that it checks its own output without being asked to.

To picture this in your own work, imagine handing the model a job that spans two systems: pull failing test results from a CI run, then open a pull request with a fix. Earlier Sonnet models tended to stop after the first half. Anthropic's own example involved updating Salesforce account tiers and sending a launch announcement to enterprise contacts in a single pass, a workflow a Zapier engineer said used to stall halfway.

For anyone building agents, this is the difference between a model that drafts a plan and one that executes it. It reduces the number of times a human has to step in to nudge the agent forward.

Tune effort levels to balance cost and accuracy

Sonnet 5 supports adjustable effort levels, so you can dial reasoning depth up or down depending on the task and budget. Anthropic positions this as a way to find the right point between Sonnet 5 and Opus 4.8 rather than picking one model and living with it.

At its maxed-out Extra High reasoning level, Sonnet 5 performs roughly in line with Opus 4.8's medium-to-high setting on OSWorld-Verified and BrowseComp. The catch is that running Sonnet 5 at that level can cost more than Opus 4.8 at a comparable reasoning setting, so Opus 4.8 remains the better choice for some high-accuracy tasks.

If you already use Claude Code, effort levels will be familiar. We cover the /effort command and its levels in our Claude Code slash commands tutorial.

Run agents more safely in production

Sonnet 5 ships with measurable safety improvements over Sonnet 4.6, which matters when an agent is touching live systems. Anthropic's pre-deployment evaluations found that it is better at refusing malicious requests and resisting hijack attempts in prompt injection attacks.

The model also shows lower rates of hallucination and sycophancy than Sonnet 4.6, and scored lower (safer) on Anthropic's automated behavioral audit covering misaligned behaviors like cooperation with misuse and deception. Lovable's co-founder framed the value plainly: a model that knows when to say no is as important as one that knows how to build.

The caveat is that Sonnet 5 still shows higher rates of misaligned behavior than the more capable Opus 4.8 and Claude Mythos Preview, so it is safer than its predecessor but not the safest model in Anthropic's lineup.

Sonnet 5 and misaligned behavior

Source: Anthropic

Run with cyber safeguards enabled by default

Sonnet 5 launches with real-time cyber safeguards turned on, the same ones present in Claude Opus 4.7 and 4.8. These detect and block dangerous cyber usage as it happens.

On Anthropic's evaluations, Sonnet 5 has substantially weaker dangerous-cyber skills than Opus 4.8 and Mythos 5. In a test built with Mozilla to develop exploits for vulnerabilities in Firefox 147, neither Sonnet 5 nor Sonnet 4.6 ever produced a working exploit (both scored 0.0%), though Sonnet 5 showed a slightly higher partial-success rate, which Anthropic attributes to general intelligence gains rather than cyber training.

For security researchers who need reduced guardrails, Anthropic recommends Opus 4.8 instead. The safeguards on Sonnet 5 are less strict than those that shipped with Fable 5, which blocked a wider range of cybersecurity tasks.

Claude Sonnet 5 Benchmarks

Anthropic's benchmark story for Sonnet 5 is consistent: it is a strict improvement over Sonnet 4.6 and sits just below Opus 4.8 on the evaluations published so far. The numbers below come from Anthropic's launch materials. One note of caution: third-party reviewers have reported different figures for the same benchmarks, likely due to different datasets, context settings, or agent scaffolds, so treat headline numbers as configuration-dependent.

  Sonnet 5 Sonnet 4.6 Opus 4.8
For reference
Agentic coding
SWE-bench Pro
63.2% 58.1% 69.2%
Agentic coding
Terminal-Bench 2.1
80.4% 67.0% 82.7%
Multidisciplinary reasoning
Humanity's Last Exam
43.2%
no tools
57.4%
with tools
34.6%
no tools
46.8%
with tools
49.8%
no tools
57.9%
with tools
Computer use
OSWorld-Verified
81.2% 78.5% 83.4%
Knowledge work
GDPval-AA v2
1618 1395 1615

Agentic coding

On the agentic coding benchmark Anthropic published, Sonnet 5 scores 63.2%, compared to Opus 4.8's 69.2% and Sonnet 4.6's 58.1%. This benchmark measures whether a model can write, run, and fix code across multiple steps rather than producing a single snippet.

The gap to Opus 4.8 is about 6 points, while the jump over Sonnet 4.6 is roughly 5 points. For developers, that means Sonnet 5 is a clear upgrade over the previous mid-tier model without quite matching the flagship.

Sonnet 5 scores on agentic computer use

Source: Anthropic

OSWorld-Verified and BrowseComp

OSWorld-Verified measures computer-use ability, controlling a desktop to complete real tasks, and BrowseComp measures agentic web search. Anthropic updated its OSWorld-Verified methodology and now reports Sonnet 4.6 at 78.5% on the revised setup.

Across both evaluations, Sonnet 5 is a strict improvement over Sonnet 4.6 at every effort level, while Opus 4.8 remains the higher-accuracy choice. At Extra High effort, Sonnet 5 reaches roughly Opus 4.8's medium-to-high performance, but at that point the cost advantage narrows, which is why Anthropic frames the two models as a single cost-accuracy range rather than a straight upgrade.

Knowledge work

On a knowledge work benchmark, Sonnet 5 slightly outperforms Opus 4.8, according to TechCrunch's reporting. That is notable because Opus 4.8 is the model usually associated with the hardest judgment calls and deep research.

Anthropic also updated its grader for Humanity's Last Exam and now reports Sonnet 4.6 at 34.6% (no tools) and 46.8% (with tools), which is why those figures differ from the original Sonnet 4.6 launch. For practitioners, the knowledge-work result suggests Sonnet 5 is viable for analysis and research tasks that previously felt like Opus territory.

How it compares to other Anthropic models

For a broader context on where the Sonnet tier sits, our coverage of Sakana Fugu vs Claude Fable 5 is useful. In that comparison, the higher-tier Claude Fable 5 scored 80.3% on SWE-Bench Pro and 59.0% on Humanity's Last Exam (no tools), well above the mid-tier figures here.

That spread is the point. Anthropic's lineup runs from the agentic, lower-cost Sonnet 5 up through Opus 4.8 and the Mythos and Fable classes, with each tier trading cost for accuracy on the hardest problems.

Claude Sonnet 5 Pricing and Availability

Claude Sonnet 5 is available everywhere from launch day. It is the default model for Free and Pro plans, and is available to Max, Team, and Enterprise users, as well as in Claude Code and on the Claude Platform.

Developers can call it via the Claude API using the model ID claude-sonnet-5. Pricing works in two phases:

  • Introductory pricing (through August 31, 2026): $2 per million input tokens and $10 per million output tokens
  • Standard pricing (after August 31, 2026): $3 per million input tokens and $15 per million output tokens

One detail to budget for: Sonnet 5 uses an updated tokenizer, so the same input can map to more tokens than before, roughly 1.0 to 1.35 times depending on content type. Anthropic set the introductory pricing so the transition from Sonnet 4.6 is roughly cost-neutral. The company also raised rate limits across Chat, Cowork, Claude Code, and the Claude Platform to handle the heavier token usage that higher effort levels bring.

Claude Sonnet 5 vs. Sonnet 4.6 and Opus 4.8 at a Glance

For readers who want the quick comparison, here is how the three models line up on the figures Anthropic published.

Model Agentic coding OSWorld-Verified Input / output price (per 1M tokens)
Sonnet 5 63.2% Improves on 4.6 at all effort levels $2 / $10 (intro), then $3 / $15
Sonnet 4.6 58.1% 78.5% (revised) $3 / $15
Opus 4.8 69.2% Higher-accuracy choice $5 / $25

Final Thoughts

Sonnet 5 is Anthropic's clearest statement yet that agentic capability is now the baseline expectation at every price tier, not a flagship-only feature. The differentiator is no longer who can do agentic work best, but who can do it cheaply and reliably without human oversight.

What I find most interesting is the introductory pricing. Anthropic seems to want customers to test Sonnet 5 against real workloads at the lowest possible cost during the migration window, which reads like a deliberate push to move people off Opus 4.8 for routine agentic work and free up that capacity. It also undercuts OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro on price, though Gemini 3.5 Flash remains cheaper.

If you are building agents that run sustained, multi-step tasks, Sonnet 5 looks like the new default to reach for first, dropping up to Opus 4.8 only when a task genuinely needs the extra accuracy. The honest caveat is that some third-party reviewers see Sonnet 5 trading reasoning depth for coding speed, so for PhD-level science or browsing-heavy work, Opus 4.8 may still be the safer pick.

If you want to get hands-on with the kind of automation Sonnet 5 is built for, I recommend our Claude Code Routines tutorial, which walks through running a coding agent on a schedule in the cloud. 

FAQs

How does Claude Sonnet 5 compare to Opus 4.8?

Sonnet 5 performs close to Opus 4.8 on agentic tasks but does not quite match it. On agentic coding it scores 63.2% versus Opus 4.8's 69.2%, though it slightly outperforms Opus 4.8 on one knowledge work benchmark. Opus 4.8 remains the higher-accuracy choice for the hardest reasoning and science tasks, while Sonnet 5 offers similar capability at a lower price.

Where can I access Claude Sonnet 5?

Claude Sonnet 5 is available across all plans from launch. It is the default model for Free and Pro plans and is available to Max, Team, and Enterprise users, in Claude Code, and on the Claude Platform. Developers can call it via the API. 

What does Claude Sonnet 5 cost?

Through August 31, 2026, Sonnet 5 runs at an introductory pricing of $2 per million input tokens and $10 per million output tokens. After that it is $3 per million input tokens and / $15 per million output tokens. 

What safety standards does Claude Sonnet 5 follow?

Anthropic's pre-deployment evaluations found Sonnet 5 is safer overall than Sonnet 4.6, with better refusal of malicious requests, stronger resistance to prompt injection, and lower rates of hallucination and sycophancy. It ships with real-time cyber safeguards enabled by default, the same ones used in Opus 4.7 and 4.8. 

What use cases is Claude Sonnet 5 best for?

Sonnet 5 is built for agentic, multi-step work like sustained coding, debugging, tool use, and two-system automation that previous Sonnet models would stall partway through. Testers highlighted strengths in brownfield code and root-cause tracing. For PhD-level science reasoning or browsing-heavy tasks, Opus 4.8 may still be the preferred choice. 


Matt Crabtree's photo
Author
Matt Crabtree
LinkedIn

A senior editor in the AI and edtech space. Committed to exploring data and AI trends.  

トピック

Top DataCamp Courses

Courses

Claude モデル入門

3時間
11.2K
Anthropic API を用い、AI アプリの構築やビジネスの課題解決に Claude を活用する方法を学びます。
詳細を見るRight Arrow
コースを開始
もっと見るRight Arrow
関連している
claude 4

blogs

Claude 4: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4 and Claude Opus 4, their features, use cases, benchmarks, and testing results.
Alex Olteanu's photo

Alex Olteanu

8 分

blogs

Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks

Explore Anthropic’s Claude Sonnet 4.6, featuring a 1M token context window, near-Opus performance, and advanced agentic capabilities for coding and finance.
Tom Farnschläder's photo

Tom Farnschläder

10 分

Claude Sonnet 4.5 hailed as the best at coding in the world

blogs

Claude Sonnet 4.5: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4.5, the ‘best coding model in the world’. Explore new features, use cases, benchmarks, and testing results, plus a look at the Claude Agents SDK and Claude Imagine.
Matt Crabtree's photo

Matt Crabtree

8 分

Image representing claude 3.7 sonnet

blogs

Claude 3.7 Sonnet: Features, Access, Benchmarks & More

Learn about Claude 3.7 Sonnet's hybrid approach of combining reasoning mode and generalist mode, key benchmarks, and how to access it via web or API.
Alex Olteanu's photo

Alex Olteanu

8 分

blogs

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Anthropic’s latest model tops leaderboards in agentic coding and complex reasoning. Plus, it has a 1M context window.
Matt Crabtree's photo

Matt Crabtree

10 分

blogs

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

Claude 3.5 Sonnet outperforms GPT-4o and Gemini Pro 1.5 in several benchmarks and introduces a cool new feature: Artifacts.
Alex Olteanu's photo

Alex Olteanu

8 分

もっと見るもっと見る