Track
Sakana markets Fugu as matching Fable 5, but excludes Fable 5 from its own benchmark table. So, we're going to compare the two models side-by-side as much as is actually possible.
Here's the backstory. The US government suspended public access to Claude Fable 5 barely three days after Anthropic launched it. And Fable 5 was billed as its most capable model. Now, two weeks later, Tokyo's Sakana AI has shipped Fugu with some big claims. One claim in particular has made the rounds: Sakana AI says Fugu Ultra "stands shoulder-to-shoulder with leading models like Fable 5 and Mythos Preview" on the industry's hardest engineering, science, and reasoning benchmarks, and with none of the export-control risk. CEO David Ha said on X that Fugu is proof that a swappable pool of orchestrated agents can match restricted frontier models like Fable.
The claims are a little hard to check because Fable 5 isn't in Fugu's benchmark table at all. Sakana excludes it on the grounds that it isn't publicly accessible. We're doing what we can: We are checkign the handful of benchmarks that appear in both labs' published tables with matching baselines. And to wrap it up, we will talk about pricing and the access situation
If you want background on the two systems individually, we have blogs on that: read our Claude Fable 5 coverage and Sakana Fugu write-up.
What Is Sakana Fugu?
Sakana Fugu is not a single trained model in the usual sense. It's an orchestrator: a model that receives your request, decides whether to answer directly or delegate to specialist models in a pool, manages verification and synthesis, and returns one response through a single OpenAI-compatible API. From the outside you call one endpoint; on the inside, a coordinated set of frontier models does the work.
It ships in two variants. Fugu balances quality with low latency and is positioned as the everyday default for coding, review, and interactive services. Fugu Ultra coordinates a deeper pool of expert agents and is tuned for maximum answer quality on hard, multi-step problems — paper reproduction, cybersecurity analysis, Kaggle-style data science, patent investigations.
The idea is really two ideas.
- First, learned orchestration: the coordinator is trained to decide when to delegate and how to combine outputs, rather than running a hand-coded pipeline.
- Second, a swappable agent pool: when a new frontier model becomes publicly available, Sakana expects to spend roughly two weeks folding it in. (Important for the rest of the article: Fable 5 is not in that pool because it isn't publicly accessible.
What Is Claude Fable 5?
Claude Fable 5 is a Mythos-class model, whic his a tier Anthropic positions above its Opus class, made safe for general use through a set of classifiers. It's the same underlying model as Claude Mythos 5; the difference is that Fable 5 runs (ran) with safety classifiers active, while Mythos 5 has some of them lifted and is restricted to Project Glasswing partners and select biology researchers.
Anthropic's claim was that Fable 5 was state-of-the-art on nearly every benchmark Anthropic tracks, with the lead growing on longer, more complex tasks. The headline practical detail: when a query touches cybersecurity, biology/chemistry, or model distillation, a two-stage classifier reroutes the response to Claude Opus 4.8 and tells the user it did so.
Sakana Fugu vs. Claude Fable 5: Benchmarks
Sakana's published comparison table excludes Fable 5 and Mythos Preview, on the grounds that they aren't publicly accessible and therefore can't be in Fugu's pool. So Fugu's official numbers are measured against Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, all of which you can see in the table below. You can see it win on 10 of 11 benchmarks.
| Benchmark | Fugu | Fugu Ultra | Opus 4.8 † | Gemini 3.1 Pro † | GPT-5.5 † |
|---|---|---|---|---|---|
| SWE-Bench Pro * | 59.0 | 73.7 | 69.2 | 54.2 | 58.6 |
| TerminalBench 2.1 | 80.2 | 82.1 | 74.6 | 70.3 | 78.2 |
| LiveCodeBench | 92.9 | 93.2 | 87.8 | 88.5 | 85.3 |
| LiveCodeBench Pro | 87.8 | 90.8 | 84.8 | 82.9 | 88.4 |
| Humanity's Last Exam | 47.2 | 50.0 | 49.8 | 44.4 | 41.4 |
| CharXiv Reasoning | 85.1 | 86.6 | 84.2 | 83.3 | 84.1 |
| GPQA-D | 95.5 | 95.5 | 92.0 | 94.3 | 93.6 |
| SciCode | 60.1 | 58.7 | 53.5 | 58.9 | 56.1 |
| τ³ Banking | 21.7 | 20.6 | 20.6 | 8.4 | 20.6 |
| Long Context Reasoning | 74.7 | 73.3 | 67.7 | 72.7 | 74.3 |
| MRCRv2 | 86.6 | 93.6 | 87.9 | 84.9 | 94.8 |
* mini-swe-agent scaffolding. † provider-reported baselines. All Fugu scores are Sakana-reported and not yet independently reproduced.
To get Fable 5 into the picture, I cross-referenced the benchmarks that appear in both Anthropic's and Sakana's tables, and checked that the shared baselines line up. On SWE-Bench Pro and Humanity's Last Exam (no tools), the Opus 4.8, GPT-5.5, and Gemini 3.1 Pro numbers are identical across both sources — so those two comparisons are clean. Stripped down to just the two systems, the head-to-head looks like this:
| Benchmark | Sakana Fugu | Sakana Fugu Ultra | Claude Fable 5 | Leader |
|---|---|---|---|---|
| SWE-Bench Pro | 59.0 | 73.7 | 80.3 | Fable 5 (+6.6) |
| Humanity's Last Exam (no tools) | 47.2 | 50.0 | 59.0 | Fable 5 (+9.0) |
| Terminal-Bench 2.1 ‡ | 80.2 | 82.1 | 88.0 | Fable 5 (+5.9) |
‡ The two labs report different baselines and use different scaffolds for TerminalBench, so the conditions aren't identical.
These are the only benchmarks we found where the published results are directly comparable. Fable 5 leads all three.
So, on every benchmark where a side-by-side is even possible, Fable 5 comes out ahead of Fugu Ultra by roughly 6–9 points. That tracks with where Fable 5 is built to win, which is on long-horizon tasks graded at the end, where a single stronger model accumulates fewer compounding errors.
In sum:
- All Fugu numbers are self-reported and haven't shown up on third-party leaderboards yet.
- Sakana characterizes Fugu as "shoulder-to-shoulder" with Fable 5 and Mythos Preview. Given the gaps above, that's a defensible but generous reading. "Close, but trailing" is more accurate.
- The comparison sets only partially overlap. Fable 5 leads on vision (it can rebuild a web app's source from screenshots), which Fugu doesn't emphasize at all; Fugu publishes long-context and banking benchmarks that Anthropic's table doesn't cover. So they're optimized for somewhat different shapes of work.
Sakana Fugu vs. Claude Fable 5: Availability and Access
Claude Fable 5 is currently suspended. Anthropic pulled access to both Fable 5 and Mythos 5 on June 12 following a US government export control directive, and says it's working to restore access as soon as possible. Anthropic's other models, like Opus 4.8, are still available.
Sakana Fugu is available now through console.sakana.ai with an OpenAI-compatible API — except in the EU and EEA, where Sakana has paused availability while it works through GDPR compliance. I couldn't get an exact timeline on that.
Right now, a European team might not be able to use either model.
Final Thoughts
On paper, this is a close, genuine contest between two philosophies.
Anthropic is thinking about scale — one Mythos-class model so capable that it needs a parallel classifier system.
Sakana is betting on coordination — that a trained orchestrator over a swappable pool can stay within striking distance of any single frontier model while being cheaper, more resilient, and provider-agnostic.
The benchmarks, taken at face value, say Anthropic's bet produces the stronger artifact on the comparable tests, while Sakana's produces the more available and cheaper one.

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess!
Sakana Fugu vs. Claude Fable FAQs
Is Sakana Fugu better than Claude Fable 5?
On the benchmarks where a side-by-side is possible (SWE-Bench Pro, Humanity's Last Exam, Terminal-Bench), Fable 5 leads Fugu Ultra by roughly 6–9 points.
Why isn't Fable 5 in Fugu's benchmark table?
Sakana excludes Fable 5 and Mythos Preview because they aren't publicly accessible and therefore can't be part of Fugu's agent pool. Its official comparison is against Opus 4.8, GPT-5.5, and Gemini 3.1 Pro, all of which Fugu Ultra beats on 10 of 11 benchmarks.
Which is cheaper?
Fugu Ultra, at $5/M input and $30/M output, is roughly half the price of Fable 5's $10/M input and $50/M output. Both offer $20/$100/$200 monthly subscription tiers.
Will Fable 5 come back?
Anthropic says it's working to restore access to Fable 5 and Mythos 5 as quickly as possible, but hasn't published a timeline. Its other models, including Opus 4.8, remain available in the meantime.
Does Fugu actually route around Fable 5's suspension?
Not directly — Fable 5 was never in Fugu's pool, so Fugu can't recover its specific capabilities.