Sakana Fugu vs. Claude Fable 5: Benchmarks, Pricing, & More

Claude Fable 5 wins on benchmarks but is currently suspended. Sakana Fugu is available now and costs half as much.

Jun 25, 2026 · 6 min read

Sakana markets Fugu as matching Fable 5, but excludes Fable 5 from its own benchmark table. So, we're going to compare the two models side-by-side as much as is actually possible.

Here's the backstory. The US government suspended public access to Claude Fable 5 barely three days after Anthropic launched it. And Fable 5 was billed as its most capable model. Now, two weeks later, Tokyo's Sakana AI has shipped Fugu with some big claims. One claim in particular has made the rounds: Sakana AI says Fugu Ultra "stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview" on the industry's hardest engineering, science, and reasoning benchmarks, and with none of the export-control risk. CEO David Ha said on X that Fugu is proof that a swappable pool of orchestrated agents can match restricted frontier models like Fable.

The claims are a little hard to check because Fable 5 isn't in Fugu's benchmark table at all. Sakana excludes it on the grounds that it isn't publicly accessible. We're doing what we can: We are checkign the handful of benchmarks that appear in both labs' published tables with matching baselines. And to wrap it up, we will talk about pricing and the access situation

If you want background on the two systems individually, we have blogs on that: read our Claude Fable 5 coverage and Sakana Fugu write-up.

What Is Sakana Fugu?

Sakana Fugu is not a single trained model in the usual sense. It's an orchestrator: a model that receives your request, decides whether to answer directly or delegate to specialist models in a pool, manages verification and synthesis, and returns one response through a single OpenAI-compatible API. From the outside you call one endpoint; on the inside, a coordinated set of frontier models does the work.

It ships in two variants. Fugu balances quality with low latency and is positioned as the everyday default for coding, review, and interactive services. Fugu Ultra coordinates a deeper pool of expert agents and is tuned for maximum answer quality on hard, multi-step problems — paper reproduction, cybersecurity analysis, Kaggle-style data science, patent investigations.

The idea is really two ideas.

First, learned orchestration: the coordinator is trained to decide when to delegate and how to combine outputs, rather than running a hand-coded pipeline.
Second, a swappable agent pool: when a new frontier model becomes publicly available, Sakana expects to spend roughly two weeks folding it in. (Important for the rest of the article: Fable 5 is not in that pool because it isn't publicly accessible.

What Is Claude Fable 5?

Claude Fable 5 is a Mythos-class model, whic his a tier Anthropic positions above its Opus class, made safe for general use through a set of classifiers. It's the same underlying model as Claude Mythos 5; the difference is that Fable 5 runs (ran) with safety classifiers active, while Mythos 5 has some of them lifted and is restricted to Project Glasswing partners and select biology researchers.

Anthropic's claim was that Fable 5 was state-of-the-art on nearly every benchmark Anthropic tracks, with the lead growing on longer, more complex tasks. The headline practical detail: when a query touches cybersecurity, biology/chemistry, or model distillation, a two-stage classifier reroutes the response to Claude Opus 4.8 and tells the user it did so.

Sakana Fugu vs. Claude Fable 5: Benchmarks

Sakana's published comparison table excludes Fable 5 and Mythos Preview, on the grounds that they aren't publicly accessible and therefore can't be in Fugu's pool. So Fugu's official numbers are measured against Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, all of which you can see in the table below. You can see it win on 10 of 11 benchmarks.

Benchmark	Fugu	Fugu Ultra	Opus 4.8 †	Gemini 3.1 Pro †	GPT-5.5 †
SWE-Bench Pro *	59.0	73.7	69.2	54.2	58.6
TerminalBench 2.1	80.2	82.1	74.6	70.3	78.2
LiveCodeBench	92.9	93.2	87.8	88.5	85.3
LiveCodeBench Pro	87.8	90.8	84.8	82.9	88.4
Humanity's Last Exam	47.2	50.0	49.8	44.4	41.4
CharXiv Reasoning	85.1	86.6	84.2	83.3	84.1
GPQA-D	95.5	95.5	92.0	94.3	93.6
SciCode	60.1	58.7	53.5	58.9	56.1
τ³ Banking	21.7	20.6	20.6	8.4	20.6
Long Context Reasoning	74.7	73.3	67.7	72.7	74.3
MRCRv2	86.6	93.6	87.9	84.9	94.8

* mini-swe-agent scaffolding. † provider-reported baselines. All Fugu scores are Sakana-reported and not yet independently reproduced.

To get Fable 5 into the picture, I cross-referenced the benchmarks that appear in both Anthropic's and Sakana's tables, and checked that the shared baselines line up. On SWE-Bench Pro and Humanity's Last Exam (no tools), the Opus 4.8, GPT-5.5, and Gemini 3.1 Pro numbers are identical across both sources — so those two comparisons are clean. Stripped down to just the two systems, the head-to-head looks like this:

Benchmark	Sakana Fugu	Sakana Fugu Ultra	Claude Fable 5	Leader
SWE-Bench Pro	59.0	73.7	80.3	Fable 5 (+6.6)
Humanity's Last Exam (no tools)	47.2	50.0	59.0	Fable 5 (+9.0)
Terminal-Bench 2.1 ‡	80.2	82.1	88.0	Fable 5 (+5.9)

‡ The two labs report different baselines and use different scaffolds for TerminalBench, so the conditions aren't identical.

These are the only benchmarks we found where the published results are directly comparable. Fable 5 leads all three.

So, on every benchmark where a side-by-side is even possible, Fable 5 comes out ahead of Fugu Ultra by roughly 6–9 points. That tracks with where Fable 5 is built to win, which is on long-horizon tasks graded at the end, where a single stronger model accumulates fewer compounding errors.

In sum:

All Fugu numbers are self-reported and haven't shown up on third-party leaderboards yet.
Sakana characterizes Fugu as "shoulder-to-shoulder" with Fable 5 and Mythos Preview. Given the gaps above, that's a defensible but generous reading. "Close, but trailing" is more accurate.
The comparison sets only partially overlap. Fable 5 leads on vision (it can rebuild a web app's source from screenshots), which Fugu doesn't emphasize at all; Fugu publishes long-context and banking benchmarks that Anthropic's table doesn't cover. So they're optimized for somewhat different shapes of work.

Sakana Fugu vs. Claude Fable 5: Availability and Access

Claude Fable 5 is currently suspended. Anthropic pulled access to both Fable 5 and Mythos 5 on June 12 following a US government export control directive, and says it's working to restore access as soon as possible. Anthropic's other models, like Opus 4.8, are still available.

Sakana Fugu is available now through console.sakana.ai with an OpenAI-compatible API — except in the EU and EEA, where Sakana has paused availability while it works through GDPR compliance. I couldn't get an exact timeline on that.

Right now, a European team might not be able to use either model.

Final Thoughts

On paper, this is a close, genuine contest between two philosophies.

Anthropic is thinking about scale — one Mythos-class model so capable that it needs a parallel classifier system.

Sakana is betting on coordination — that a trained orchestrator over a swappable pool can stay within striking distance of any single frontier model while being cheaper, more resilient, and provider-agnostic.

The benchmarks, taken at face value, say Anthropic's bet produces the stronger artifact on the comparable tests, while Sakana's produces the more available and cheaper one.

Author

Josef Waples

Is Sakana Fugu better than Claude Fable 5?

Why isn't Fable 5 in Fugu's benchmark table?

Which is cheaper?

Will Fable 5 come back?

Does Fugu actually route around Fable 5's suspension?

Topics

Artificial Intelligence

Learn AI with DataCamp

Track

AI for Software Engineering

7 hr

Write code and build software applications faster than ever before with the latest AI developer tools, including GitHub Copilot, Windsurf, and Replit.

See Details

Start Course

Course

Software Development with Claude Code

4 hr

4.2K

Claude Code brings AI assistance to your terminal. Learn the workflows that turn it into a reliable tool for real software development.

See Details

Start Course

Course

Introduction to Agent Skills

2 hr 30 min

1.4K

Learn how to build, configure, and share Skills in Claude Code — reusable markdown instructions that Claude automatically applies to tasks at the right time.

See Details

Start Course

blog

Sakana Fugu: Features, Benchmarks, and How It Works

Sakana AI's Fugu orchestrates a pool of frontier LLMs behind one API. We cover the features, benchmark numbers, pricing, and real-world use cases.

Matt Crabtree

12 min

blog

Claude Fable 5 vs. Gemini 3.5 Flash: Benchmarks, Pricing, and More

Claude Fable 5 dominates on raw capability, but Gemini 3.5 Flash delivers near-frontier performance at a fraction of the cost and several times the speed. Keep reading to learn more.

Josef Waples

9 min

blog

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Fable 5 leads on raw capability benchmarks, but GPT-5.5 wins on access, pricing, and fewer classifier interruptions. Here's how to choose.

Tom Farnschläder

11 min

blog

Claude Fable 5: A Mythos-Class Model You Can Use

Anthropic's Claude Fable 5 is the new state-of-the-art AI model, delivering a clean sweep of every major benchmark including SWE-Bench Pro, FrontierCode Diamond, and Humanity's Last Exam.

Josef Waples

10 min

blog

Claude Mythos 5: Features, Benchmarks, and What It Can Do

Anthropic's most capable model yet, Claude Mythos 5 brings Mythos-class AI to cybersecurity, drug design, and scientific research with the safeguards lifted for trusted partners.

Tom Farnschläder

11 min

blog

DeepSeek vs. Claude: Comparing Two Leading AI Models

Explore how DeepSeek and Claude differ in reasoning, coding, language generation, and pricing to find the right AI model for your workflow.

Vinod Chugani

9 min

See More See More

What Is Sakana Fugu?

What Is Claude Fable 5?

Sakana Fugu vs. Claude Fable 5: Benchmarks

Sakana Fugu vs. Claude Fable 5: Availability and Access

Final Thoughts

Sakana Fugu vs. Claude Fable FAQs

Which is cheaper?

Will Fable 5 come back?

Does Fugu actually route around Fable 5's suspension?

Sakana Fugu: Features, Benchmarks, and How It Works

Claude Fable 5 vs. Gemini 3.5 Flash: Benchmarks, Pricing, and More

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Fable 5: A Mythos-Class Model You Can Use

Claude Mythos 5: Features, Benchmarks, and What It Can Do

DeepSeek vs. Claude: Comparing Two Leading AI Models

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}AI for Software Engineering

Software Development with Claude Code

Introduction to Agent Skills

Sakana Fugu: Features, Benchmarks, and How It Works

Claude Fable 5 vs. Gemini 3.5 Flash: Benchmarks, Pricing, and More

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Fable 5: A Mythos-Class Model You Can Use

Claude Mythos 5: Features, Benchmarks, and What It Can Do

DeepSeek vs. Claude: Comparing Two Leading AI Models

AI for Software Engineering