Claude Sonnet 5 vs. GPT-5.6: Benchmarks, Pricing, and Access

Sonnet 5 is available now at a discount. GPT-5.6 posts stronger numbers in Terminal-Bench-2.1 but isn't generally available yet. Here's the full breakdown.

30 juni 2026 · 7 min läsa

Anthropic and OpenAI both have new releases on the table this week, but the two launches are at different stages.

GPT-5.6 has been unveiled, but it's locked behind a limited preview. And while GPT-5.6 is not generally available, Claude Sonnet 5 shipped today and is live across every Claude plan — with introductory API pricing that, for now, actually undercuts GPT-5.6's mid-tier model, Terra.

Here's what we know about each model, how they stack up, and where the numbers overlap.

What Is Claude Sonnet 5?

Claude Sonnet 5 is Anthropic's latest mid-tier model, and the company is billing it as the most agentic Sonnet model yet. The pitch is that it closes much of the gap to Opus-class models on planning and tool use while staying meaningfully cheaper.

Anthropic says Sonnet 5 is a substantial step up from Sonnet 4.6 on reasoning, coding, and tool use, and that its performance now sits somewhat close to Opus 4.8 on several agentic evaluations, including BrowseComp (agentic search) and OSWorld-Verified (computer use).

Opus 4.8 still leads on raw accuracy, especially at the higher or highest effort settings, but Sonnet 5 does give developers a stronger price-to-performance option. Anthropic's mid-tier models have a track record of punching above their weight class when a new generation lands before the next Opus refresh catches up. It seems like a long time ago now, but Claude Sonnet 3.5 actually outperformed the older Claude Opus 3 on several benchmarks back in 2024.

Sonnet 5 also uses an updated tokenizer which means the same input text can map to more tokens than before (roughly 1.0–1.35x depending on content).

What Is GPT-5.6?

GPT-5.6 isn't a single model. It's a family of three, organized under a new naming scheme where the number marks the generation and the name marks the capability tier:

Sol is the flagship. It's the only tier with access to two new controls: a max reasoning-effort setting and ultra mode, which farms a task out to a pool of subagents.
Terra is the everyday, balanced model — roughly comparable to GPT-5.5 on overall quality at about half the price.
Luna is the low-cost tier, aimed at latency-sensitive workloads.

With the news, OpenAI is highlighting gains in coding, biology, and cybersecurity. The big benchmark story was its performance on several interesting benchmarks, including most notably Terminal-Bench 2.1, but also GeneBench v1, which is a long-horizon genomics benchmark, and two exploit-development benchmarks (ExploitBench and ExploitGym).

Claude Sonnet 5 and GPT-5.6 Benchmark Comparison

From the respective announcements, Sonnet 5 and GPT-5.6 don't share a single common benchmark with numbers from both sides.

Although Anthropic has published some hard numbers for Sonnet 5 in its launch post and system card, OpenAI has so far confirmed only one benchmark score for GPT-5.6: Terminal-Bench 2.1. Here's what's actually on the record for each side, laid out together:

Benchmark	Claude Sonnet 5	GPT-5.6 Sol	GPT-5.6 Sol Ultra	GPT-5.6 Terra	GPT-5.6 Luna
Terminal-Bench 2.1	Not published	88.8%	91.9%	82.5%	84.3%
SWE-bench Pro	63.2%	Not published	—	Not published	Not published
OSWorld-Verified (computer use)	81.2%	Not published	—	Not published	Not published
Humanity's Last Exam (with tools)	57.4%	Not published	—	Not published	Not published

Terminal-Bench 2.1 is the one row where GPT-5.6 has published anything at all, and Sonnet 5 has no published score to put there. Maybe that's strategic because GPT-5.6 did well.

On the benchmarks where Sonnet 5 does have numbers — SWE-bench Pro, OSWorld-Verified, Humanity's Last Exam — GPT-5.6 simply hasn't shown its hand yet, so there's nothing to compare against.

The honest conclusion is that a true Sonnet 5 vs. GPT-5.6 benchmark comparison doesn't fully exist yet. Each company has published a different set of evals.

Zooming out: where each model fits in the wider field

Since Sonnet 5 and GPT-5.6 don't overlap directly, it's worth pulling back and looking at how each one's published scores stack up against the rest of the field — Opus 4.8, the suspended Fable 5, GPT-5.5, and Gemini 3.1 Pro. This won't settle Sonnet 5 vs. GPT-5.6, but it shows where each one sits relative to the same set of competitors:

Benchmark	Claude Opus 4.8	Claude Fable 5 (pre-suspension)	GPT-5.5	Gemini 3.1 Pro	GPT-5.6 Sol Ultra
Terminal-Bench 2.1	78.9%	83.4%	88.0%	70.7%	91.9%
SWE-bench Pro	69.2%	80.3%	58.6%	—	Not published
Humanity's Last Exam (no tools)	49.8%	59.0%	—	44.4%	Not published

A couple of things: On Terminal-Bench 2.1, GPT-5.6 Sol Ultra's 91.9% outscores every Claude model on record, including Opus 4.8 at 78.9%. So OpenAI wasn't exaggerating when they were bragging about GPT-5.6's score on Terminal-Bench 2.1.

Fable 5 still leads Opus 4.8 and GPT-5.5 on SWE-bench Pro and Humanity's Last Exam, even while suspended. Its numbers just haven't been beaten yet, by anyone, including GPT-5.6 (which hasn't published a SWE-bench or HLE score to compare). Sonnet 5's 63.2% on SWE-bench Pro (from the table above) would slot in above GPT-5.5's 58.6% but below Opus 4.8 and Fable 5 — a loose proxy at best, since it's a different model generation being read across two companies' separate tables.

Claude Sonnet 5 and GPT-5.6 Pricing

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Sonnet 5 (intro, through Aug 31, 2026)	$2.00	$10.00
Claude Sonnet 5 (standard)	$3.00	$15.00
GPT-5.6 Sol	$5.00	$30.00
GPT-5.6 Terra	$2.50	$15.00
GPT-5.6 Luna	$1.00	$6.00

On list price, Sonnet 5's introductory price lands between GPT-5.6 Terra and Luna, and well under Sol. But it's the introductory price that's the more interesting data point: at $2/$10, Sonnet 5 actually undercuts GPT-5.6 Terra's $2.50/$15, at least through the summer until there's a price increase.

To be clear, the pricing story is a little more involved than simple back-of-the-envelope math. Anthropic notes that Sonnet 5's introductory pricing is calibrated to be roughly cost-neutral against the new tokenizer's higher token counts. And OpenAI, for its part, is introducing more predictable prompt caching for GPT-5.6 and later models, with cache writes billed at 1.25x the uncached input rate and cache reads keeping a 90% discount.

Why launch with discounted introductory pricing at all, rather than just setting the standard rate from day one? The tokenizer explanation covers part of it, but the timing also lines up with what we are seeing: a crowded and fast-moving field. In the last few weeks alone, Sakana AI shipped Fugu, an orchestrator that routes across a pool of frontier models and claims to come within striking distance of Anthropic's own Mythos-class systems at a fraction of the cost. So pricing Sonnet 5 at $2/$10 through the end of August gives Anthropic a genuinely cheap, broadly available agentic model to point to while GPT-5.6 is still locked in preview. Being the model people can actually use, at a low price, is a real competitive advantage.

Claude Sonnet 5 and GPT-5.6 Availability

Claude Sonnet 5 is available today, everywhere. It's the default model on Free and Pro plans, accessible on Max, Team, and Enterprise, and live in Claude Code and on the Claude Platform via the API (model string claude-sonnet-5). You may have noticed Claude Sonnet 5 in the chat window before you even saw the announcement.

GPT-5.6 is not generally available. During the preview, it's accessible only through the API and Codex to a select group of partners. OpenAI says broader availability is coming, but hasn't given a date.

Final Thoughts

On the one overlapping benchmark, GPT-5.6's flagship tier reports higher scores than Claude's current flagship, Opus 4.8, which suggests OpenAI may have a real capability lead once GPT-5.6 ships broadly. But Sonnet 5 isn't really Sol's or Terra's direct competitor on capability — it's a mid-tier model competing on agentic performance per dollar, and on that front it's available, it's priced aggressively, and it works today.

Author

Josef Waples

Ämnen

Artificial Intelligence

Learn with DataCamp

course

Förstå prompt engineering

1 timmar

206.9K

Lär dig skriva effektiva prompts med ChatGPT för att använda i ditt arbetsflöde redan idag.

Se detaljer

Starta kursen

course

Intermediate ChatGPT

1 timmar

29.9K

Lär dig arkitekturen bakom GPT-modeller och bemästra avancerad promptdesign för att låsa upp ChatGPT:s fulla potential.

Se detaljer

Starta kursen

course

Introduction to Claude Models

3 timmar

11.2K

Learn how to work with Claude using the Anthropic API to solve real-world tasks and build AI-powered applications.

Se detaljer

Starta kursen

Se mer

Släkt

blog

Claude Sonnet 5: Features, Benchmarks, Pricing, and More

Claude Sonnet 5 nears Opus 4.8 on agentic benchmarks at lower cost. Discover its features, benchmarks, pricing, and more.

Matt Crabtree

9 min

blog

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Fable 5 leads on raw capability benchmarks, but GPT-5.5 wins on access, pricing, and fewer classifier interruptions. Here's how to choose.

Tom Farnschläder

11 min

blog

Claude Opus 4.8 vs GPT-5.5: Benchmarks, Tests, and Which to Choose

A head-to-head comparison of Anthropic's Claude Opus 4.8 and OpenAI's GPT-5.5 across coding, reasoning, agentic tasks, and pricing.

Tom Farnschläder

11 min

blog

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

Claude 3.5 Sonnet outperforms GPT-4o and Gemini Pro 1.5 in several benchmarks and introduces a cool new feature: Artifacts.

Alex Olteanu

8 min

blog

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

Discover how GPT-5.2 improves knowledge work with major upgrades in long-context reasoning, tool calling, coding, vision, and end-to-end workflow execution.

Josef Waples

10 min

blog

GPT-5.4 mini and nano: Benchmarks, Access, and Reactions

Take a close look at OpenAI's latest small models, which are built for speed. Compare performance and pricing with Claude Haiku 4.5.

Josef Waples

7 min

Se mer Se mer

What Is Claude Sonnet 5?

What Is GPT-5.6?

Claude Sonnet 5 and GPT-5.6 Benchmark Comparison

Zooming out: where each model fits in the wider field

Claude Sonnet 5 and GPT-5.6 Pricing

Claude Sonnet 5 and GPT-5.6 Availability

Final Thoughts

Claude Sonnet 5: Features, Benchmarks, Pricing, and More

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Opus 4.8 vs GPT-5.5: Benchmarks, Tests, and Which to Choose

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

GPT-5.4 mini and nano: Benchmarks, Access, and Reactions

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Förstå prompt engineering

Intermediate ChatGPT

Introduction to Claude Models

Claude Sonnet 5: Features, Benchmarks, Pricing, and More

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Opus 4.8 vs GPT-5.5: Benchmarks, Tests, and Which to Choose

What Is Claude 3.5 Sonnet? How It Works, Use Cases, and Artifacts

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

GPT-5.4 mini and nano: Benchmarks, Access, and Reactions

Förstå prompt engineering