Skip to main content

Sakana Fugu vs. Claude Fable 5: Benchmarks, Pricing, & More

Claude Fable 5 wins on benchmarks but is currently suspended. Sakana Fugu is available now and costs half as much.
Jun 25, 2026  · 6 min read

Sakana markets Fugu as matching Fable 5, but excludes Fable 5 from its own benchmark table. So, we're going to compare the two models side-by-side as much as is actually possible.

Here's the backstory. The US government suspended public access to Claude Fable 5 barely three days after Anthropic launched it. And Fable 5 was billed as its most capable model. Now, two weeks later, Tokyo's Sakana AI has shipped Fugu with some big claims. One claim in particular has made the rounds: Sakana AI says Fugu Ultra "stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview" on the industry's hardest engineering, science, and reasoning benchmarks, and with none of the export-control risk. CEO David Ha said on X that Fugu is proof that a swappable pool of orchestrated agents can match restricted frontier models like Fable.

The claims are a little hard to check because Fable 5 isn't in Fugu's benchmark table at all. Sakana excludes it on the grounds that it isn't publicly accessible. We're doing what we can: We are checkign the handful of benchmarks that appear in both labs' published tables with matching baselines. And to wrap it up, we will talk about pricing and the access situation

If you want background on the two systems individually, we have blogs on that: read our Claude Fable 5 coverage and Sakana Fugu write-up.

What Is Sakana Fugu?

Sakana Fugu is not a single trained model in the usual sense. It's an orchestrator: a model that receives your request, decides whether to answer directly or delegate to specialist models in a pool, manages verification and synthesis, and returns one response through a single OpenAI-compatible API. From the outside you call one endpoint; on the inside, a coordinated set of frontier models does the work.

It ships in two variants. Fugu balances quality with low latency and is positioned as the everyday default for coding, review, and interactive services. Fugu Ultra coordinates a deeper pool of expert agents and is tuned for maximum answer quality on hard, multi-step problems — paper reproduction, cybersecurity analysis, Kaggle-style data science, patent investigations.

The idea is really two ideas.

  • First, learned orchestration: the coordinator is trained to decide when to delegate and how to combine outputs, rather than running a hand-coded pipeline.
  • Second, a swappable agent pool: when a new frontier model becomes publicly available, Sakana expects to spend roughly two weeks folding it in. (Important for the rest of the article: Fable 5 is not in that pool because it isn't publicly accessible.

What Is Claude Fable 5?

Claude Fable 5 is a Mythos-class model, whic his a tier Anthropic positions above its Opus class, made safe for general use through a set of classifiers. It's the same underlying model as Claude Mythos 5; the difference is that Fable 5 runs (ran) with safety classifiers active, while Mythos 5 has some of them lifted and is restricted to Project Glasswing partners and select biology researchers.

Anthropic's claim was that Fable 5 was state-of-the-art on nearly every benchmark Anthropic tracks, with the lead growing on longer, more complex tasks. The headline practical detail: when a query touches cybersecurity, biology/chemistry, or model distillation, a two-stage classifier reroutes the response to Claude Opus 4.8 and tells the user it did so. 

Sakana Fugu vs. Claude Fable 5: Benchmarks

Sakana's published comparison table excludes Fable 5 and Mythos Preview, on the grounds that they aren't publicly accessible and therefore can't be in Fugu's pool. So Fugu's official numbers are measured against Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, all of which you can see in the table below. You can see it win on 10 of 11 benchmarks. 

Benchmark Fugu Fugu Ultra Opus 4.8 † Gemini 3.1 Pro † GPT-5.5 †
SWE-Bench Pro * 59.0 73.7 69.2 54.2 58.6
TerminalBench 2.1 80.2 82.1 74.6 70.3 78.2
LiveCodeBench 92.9 93.2 87.8 88.5 85.3
LiveCodeBench Pro 87.8 90.8 84.8 82.9 88.4
Humanity's Last Exam 47.2 50.0 49.8 44.4 41.4
CharXiv Reasoning 85.1 86.6 84.2 83.3 84.1
GPQA-D 95.5 95.5 92.0 94.3 93.6
SciCode 60.1 58.7 53.5 58.9 56.1
τ³ Banking 21.7 20.6 20.6 8.4 20.6
Long Context Reasoning 74.7 73.3 67.7 72.7 74.3
MRCRv2 86.6 93.6 87.9 84.9 94.8

* mini-swe-agent scaffolding. † provider-reported baselines. All Fugu scores are Sakana-reported and not yet independently reproduced.

To get Fable 5 into the picture, I cross-referenced the benchmarks that appear in both Anthropic's and Sakana's tables, and checked that the shared baselines line up. On SWE-Bench Pro and Humanity's Last Exam (no tools), the Opus 4.8, GPT-5.5, and Gemini 3.1 Pro numbers are identical across both sources — so those two comparisons are clean. Stripped down to just the two systems, the head-to-head looks like this:

Benchmark Sakana Fugu Sakana Fugu Ultra Claude Fable 5 Leader
SWE-Bench Pro 59.0 73.7 80.3 Fable 5 (+6.6)
Humanity's Last Exam (no tools) 47.2 50.0 59.0 Fable 5 (+9.0)
Terminal-Bench 2.1 ‡ 80.2 82.1 88.0 Fable 5 (+5.9)

‡ The two labs report different baselines and use different scaffolds for TerminalBench, so the conditions aren't identical.

These are the only benchmarks we found where the published results are directly comparable. Fable 5 leads all three.

So, on every benchmark where a side-by-side is even possible, Fable 5 comes out ahead of Fugu Ultra by roughly 6–9 points. That tracks with where Fable 5 is built to win, which is on long-horizon tasks graded at the end, where a single stronger model accumulates fewer compounding errors.

In sum:

  1. All Fugu numbers are self-reported and haven't shown up on third-party leaderboards yet.
  2. Sakana characterizes Fugu as "shoulder-to-shoulder" with Fable 5 and Mythos Preview. Given the gaps above, that's a defensible but generous reading. "Close, but trailing" is more accurate.
  3. The comparison sets only partially overlap. Fable 5 leads on vision (it can rebuild a web app's source from screenshots), which Fugu doesn't emphasize at all; Fugu publishes long-context and banking benchmarks that Anthropic's table doesn't cover. So they're optimized for somewhat different shapes of work.

Sakana Fugu vs. Claude Fable 5: Availability and Access

Claude Fable 5 is currently suspended. Anthropic pulled access to both Fable 5 and Mythos 5 on June 12 following a US government export control directive, and says it's working to restore access as soon as possible. Anthropic's other models, like Opus 4.8, are still available.

Sakana Fugu is available now through console.sakana.ai with an OpenAI-compatible API — except in the EU and EEA, where Sakana has paused availability while it works through GDPR compliance. I couldn't get an exact timeline on that.

Right now, a European team might not be able to use either model.

Final Thoughts

On paper, this is a close, genuine contest between two philosophies.

Anthropic is thinking about scale — one Mythos-class model so capable that it needs a parallel classifier system.

Sakana is betting on coordination — that a trained orchestrator over a swappable pool can stay within striking distance of any single frontier model while being cheaper, more resilient, and provider-agnostic.

The benchmarks, taken at face value, say Anthropic's bet produces the stronger artifact on the comparable tests, while Sakana's produces the more available and cheaper one.


Josef Waples's photo
Author
Josef Waples

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess! 

Sakana Fugu vs. Claude Fable FAQs

Is Sakana Fugu better than Claude Fable 5?

On the benchmarks where a side-by-side is possible (SWE-Bench Pro, Humanity's Last Exam, Terminal-Bench), Fable 5 leads Fugu Ultra by roughly 6–9 points. 

Why isn't Fable 5 in Fugu's benchmark table?

Sakana excludes Fable 5 and Mythos Preview because they aren't publicly accessible and therefore can't be part of Fugu's agent pool. Its official comparison is against Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, all of which Fugu Ultra beats on 10 of 11 benchmarks.

Which is cheaper?

Fugu Ultra, at $5/M input and $30/M output, is roughly half the price of Fable 5's $10/M input and $50/M output. Both offer $20/$100/$200 monthly subscription tiers.

Will Fable 5 come back?

Anthropic says it's working to restore access to Fable 5 and Mythos 5 as quickly as possible, but hasn't published a timeline. Its other models, including Opus 4.8, remain available in the meantime.

Does Fugu actually route around Fable 5's suspension?

Not directly — Fable 5 was never in Fugu's pool, so Fugu can't recover its specific capabilities.

Topics

Learn AI with DataCamp

Track

AI for Software Engineering

7 hr
Write code and build software applications faster than ever before with the latest AI developer tools, including GitHub Copilot, Windsurf, and Replit.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

Sakana Fugu: Features, Benchmarks, and How It Works

Sakana AI's Fugu orchestrates a pool of frontier LLMs behind one API. We cover the features, benchmark numbers, pricing, and real-world use cases.
Matt Crabtree's photo

Matt Crabtree

12 min

blog

Claude Fable 5 vs. Gemini 3.5 Flash: Benchmarks, Pricing, and More

Claude Fable 5 dominates on raw capability, but Gemini 3.5 Flash delivers near-frontier performance at a fraction of the cost and several times the speed. Keep reading to learn more.
Josef Waples's photo

Josef Waples

9 min

blog

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Fable 5 leads on raw capability benchmarks, but GPT-5.5 wins on access, pricing, and fewer classifier interruptions. Here's how to choose.
Tom Farnschläder's photo

Tom Farnschläder

11 min

blog

Claude Fable 5: A Mythos-Class Model You Can Use

Anthropic's Claude Fable 5 is the new state-of-the-art AI model, delivering a clean sweep of every major benchmark including SWE-Bench Pro, FrontierCode Diamond, and Humanity's Last Exam.
Josef Waples's photo

Josef Waples

10 min

blog

Claude Mythos 5: Features, Benchmarks, and What It Can Do

Anthropic's most capable model yet, Claude Mythos 5 brings Mythos-class AI to cybersecurity, drug design, and scientific research with the safeguards lifted for trusted partners.
Tom Farnschläder's photo

Tom Farnschläder

11 min

blog

DeepSeek vs. Claude: Comparing Two Leading AI Models

Explore how DeepSeek and Claude differ in reasoning, coding, language generation, and pricing to find the right AI model for your workflow.
Vinod Chugani's photo

Vinod Chugani

9 min

See MoreSee More