Skip to main content

Claude Fable 5: A Mythos-Class Model You Can Use

Anthropic's Claude Fable 5 is the new state-of-the-art AI model, delivering a clean sweep of every major benchmark including SWE-Bench Pro, FrontierCode Diamond, and Humanity's Last Exam.
Jun 9, 2026  · 10 min read

We first started hearing about Claude Mythos back in April when we wrote about the release of Claude Opus 4.7. We understood at the time that Opus 4.7, while extremely impressive in its own right, was only the best public available model because the Mythos-class models were being held back from broad release.

Well, Anthropic has now given us Claude Fable 5, a Mythos-class model safe for general use. It’s undoubtedly the new leader in the important benchmark tests, including things like knowledge work, software engineering, and scientific research. It's a big win for Anthropic, which only eight days ago filed for a U.S. IPO.

In this article, we will take a close look at Claude Fable 5, how it compares with other models on the market, like GPT-5.5 and Gemini 3.1 Pro, as well as Claude's own models, like Opus 4.8. We will also offer some guidance on when to use the model because it's more expensive.

What Is Claude Fable 5?

Claude Fable 5 is the name Anthropic is giving to its newest and most capable model. It is a Mythos-class model, but it includes important safeguards in important areas, namely cybersecurity. 

For the user, this means that certain questions will be answered by the next-in-line model, Claude Opus 4.8. Anthropic estimates that this kind of rerouting happens in fewer than 5% of all queries, which Anthropic thinks is conservative. That’s just as well, since cybersecurity concerns are involved. Also, of course, that percent is going to vary a lot depending on what kind of work you do. 

We are especially excited about Claude Fable 5. The model is so capable now that it has moved into new areas of usefulness: Cyber defenders will use the model to secure critical systems, and life science researchers could use the model to improve new therapeutics. Astrophysicists can use these models to map galaxies. On and on.

If you read the release, you might have heard of Claude Mythos 5 also. If you’re wondering: Claude Mythos 5 is the same underlying model but with some of the safeguards lifted. It’s not for general use: It’s deployed at part of Project Glasswing in partnership with the U.S. government, effectively replacing Claude Mythos Preview. We don’t know as much about Claude Mythos 5, but we do know that it has the strongest cybersecurity capabilities of any model in the world. Anthropic’s partnership with the U.S. government is evidence of that itself. 

What's New with Claude Fable 5

Safety classifiers 

The biggest structural addition in Fable 5 is a two-stage classifier system designed to prevent misuse of the model's most dangerous capabilities. A probe monitors the model's internal activations across all traffic, and any flagged requests are escalated to a separate trained LLM classifier that makes the final call. When a request is blocked, it's rerouted to Opus 4.8 instead of being outright refused, so the user still gets a useful response.

Fallback notification system 

This is similar to the point above. When Fable 5's classifiers reroute a query to Opus 4.8, users are notified which model handled their request. This is a new transparency behavior, so users wouldn’t have to wonder if a response felt different.

File-based memory as a thinking tool 

Fable 5 can write notes to a file mid-task and refer back to them later, which is distinct from simply having a long context window. This is an active working memory behavior. Anthropic's own testing showed it improved performance significantly more than it did for Opus 4.8.

30-day data retention policy 

Anthropic is introducing a new data retention policy that applies to all Fable 5 traffic. Data is retained for 30 days to help detect novel jailbreaks and reduce false positives. Anthropic says this data won't be used for model training, and human access to it is logged.

Claude Fable 5 Benchmark Results

Claude Fable 5 wins hands-down on all of the benchmark tests that we watch closely. On SWE-Bench Pro, it scores 80.3%, ahead of Opus 4.8's 69.2%. On Humanity's Last Exam with tools, it scores 64.5%, compared to 57.9% for Opus 4.8 and 52.2% for GPT-5.5.

  • SWE-Bench Pro
  • Frontier Code (Diamond)
  • GDPval-AA
  • OSWorld-Verified
  • Humanity’s Last Exam
  • Terminal-Bench 2.1
Category Benchmark Claude Mythos 5 / Fable 5 Claude Mythos Preview Claude Opus 4.8 GPT 5.5 Gemini 3.1 Pro
Agentic coding SWE-Bench Pro 80.3% 77.8% 69.2% 58.6% 54.2%
Agentic coding FrontierCode (Diamond) 29.3% (xhigh) 13.4% (xhigh) 5.7% (xhigh)
Knowledge work GDPval-AA 1932 1890 1769 1314
Knowledge work vision GDP.pdf 29.8% (no tools) 22.5% (no tools) 24.9% (no tools) 16.7% (no tools)
Spatial reasoning Blueprint-Bench 2 38.6% 14.5% 36.2% 26.5%
Tool use AutomationBench 17.4% 15.5% 12.9% 9.6%
Computer use OSWorld-Verified 85.0% 85.4% 83.4% 78.7% 76.2%
Legal Legal Agent Benchmark 13.3% 10.4% 2.1% 0.0%
Multidisciplinary reasoning Humanity's Last Exam (no tools) 59.0%* 56.8% 49.8% 41.4% 44.4%
Multidisciplinary reasoning Humanity's Last Exam (with tools) 64.5%* 64.7% 57.9% 52.2% 51.4%
Biology BioMysteryBench (hard) 46.1%* 29.6% 40.0%
Biology BioMysteryBench (human solved) 83.9%* 82.6% 80.4%
Agentic coding Terminal-Bench 2.1 88.0%* 82.7% 83.4% (Codex CLI) 70.7% (Gemini CLI)
Cybersecurity ExploitBench (Cap%) 78.0%* 69.0% 40.0% 34.0%
Health HealthBench Professional 66.0%* 64.7% 56.9% 51.8%

Claude Fable 5 for software engineering

I mentioned the FrontierCode benchmark test but didn’t get a chance to go into it. FrontierCode tests if a model can not only pass a coding test, which itself is difficult, but also if it can meet higher, more general standards for high-quality production codebases, which would mean that the code would have to be performant at scale, idiomatic, structured for long-term maintainability, etc.

Claude Fable 5 scores higher on the FrontierCode test than any other model without even using the high or xhigh effort parameters. 

Claude Fable 5 Accuracy vs Cost

Claude Fable 5 Agentic Coding

Claude Fable 5 for vision

Claude Fable 5 is the new best model for vision tasks. In Anthropic's testing, Fable 5 could rebuild a web application's source code from screenshots alone with no scaffolding. This is the same capability that shows up in its 93.2% score on CharXiv Reasoning with tools.

The Fable 5 model also does extremely well on a problem that more people probably encounter: It can extract precise numbers from detailed scientific figures without making many mistakes.

Claude Fable 5 memory and long-context

We have been talking a lot about AI models’ memory and long-context abilities all year. With Fable 5, Anthropic brings us some new things to talk about. 

For one, Fable 4 stays focused across millions of tokens. If you’re not familiar with the idea, AI models, when doing long tasks, can ‘forget’ or lose track of earlier context as the conversation gets longer and longer. But Fable 5 is better at maintaining coherent, consistent behavior even when the task is very long. How long is very long? For context, a million tokens is a 700-page novel. 

Anthropic also tells us, interestingly, that Fable 5 improves its outputs using its own notes. So, the model can write things down mid-task (to a file, as an example) and refer back to them (the things it wrote down) later. It’s akin to a person jotting notes while working through a problem. This is different from having a long memory. T model uses external storage as a thinking tool.

Claude Fable 5 vs. GPT-5.5 vs. Gemini 3.1 Pro

The table above already shows Fable 5 winning in every category. But one useful question is the shape of each lead: where it's a chasm, where it's a coin flip, and why.

Claude Fable 5 vs. GPT-5.5

GPT-5.5 is the closest competitor, and on most tasks the two are nearly tied. Blueprint-Bench 2 spatial reasoning is 36.2 for GPT-5.5 and 38.6 for Fable 5. They stay close even on long agentic tasks, as long as the model gets feedback as it works. On Terminal-Bench 2.1, where the terminal flags errors right away, it's 83.4 to 88.0.

One situation breaks the pattern: a long task that's graded only at the end. There the model can't catch its own mistakes, so they pile up and the stronger model pulls ahead. SWE-Bench Pro is the clearest case, at 58.6 for GPT-5.5 against 80.3 for Fable 5.

Claude Fable 5 vs. Gemini 3.1 Pro

Gemini 3.1 Pro trails Fable 5 across the board, and the gap widens on the work that matters most. On GDPval-AA knowledge work, it scores 1314 to Fable 5's 1932. On Terminal-Bench 2.1, it's 70.7 to 88.0. The gap is smaller on shorter, contained tasks — spatial reasoning is 26.5 to 38.6 — but it never closes. The pattern points to a model that handles short problems decently but loses the thread on the long, tool-heavy workflows where Fable 5 is built to live.

Testing Claude Fable 5

One of the specific claims in Anthropic's release is that Fable 5 can extract precise numbers from detailed scientific figures without making many mistakes. 

To put this to the test, we created a two-panel scientific figure modeled on the kind of output you'd find in a cell biology paper: overlapping gene expression time series on a log₂ axis, a dual-axis panel pairing percentages with other measurements in arbitrary units, etc.

We thought this would be tricky. The dual y-axes in panel B use different scales and different units. Gene Z crosses into negative log₂ territory in certain places. 

We asked Fable 5 to extract all numeric values from both panels and the table.

please extract all numeric values

The result was basically perfect. My own table used to creat the graph had Gene Z with a value of -0.05 at 48 hours. Fable 5 told me it was -0.04 at this point, not -0.05.

What People Are Saying About Claude Fable 5

In the release, Anthropic shows us early access feedback/product testimonials for customers or partners, like GitHub and Figma. They say positive things, but that’s expected. Every Anthropic model release has this section. 

Claude Fable 5 Testimonials

X gives us a less curated opinion. What jumped out immediately, Andrej Karpathy has given the model a basically glowing review, recognizing that Fable 5 is state-of-the-art on all relevant benchmarks by a margin, and also calling the model a step-change like Opus 4.5 was in November. In the rest of the post, he gives plenty of other ideas that you can do with Fable 5, like: 

  • Bespoke single-use apps
  • Auto-optimize your code
  • Run research projects with custom HTML for the results

Claude Fable 5 Andrej Karpathy

Dan Shipper, the CEO of Every, says similar stuff, that Fable 5 blew away the benchmark tests, for one, and also that it’s a ‘one-shot wonder’ - you can set it to a big project and come back the next morning and have completed work. (He had early access, which is how he knows.)

There is a bit of criticism to balance things out: He says the model is slow and token-hungry, which is another way of saying expensive. He says it’s twice as expensive as Opus. So he acknowledges, smartly, that you should only use the model for stuff that requires it.

What Are Claude Fable 5's Guardrails?

I mentioned that Claude Fable 5 is Claude Mythos 5 but with safeguards. What that means is that Fable 5 is powerful enough that, without restrictions, it could provide meaningful assistance to bad actors in areas like cybersecurity and biology. That’s the belief. 

Here is how the safeguards work at a high level. It's a two-stage classifier system:

  • First, a probe monitors Claude's internal activations across all traffic, and any flagged requests are escalated to a separate trained LLM classifier that makes the final call on whether to block or allow the response. 
  • Second, when a request is blocked, it gets rerouted to Claude Opus 4.8 instead, and the user is notified which model handled their query.

Now, there are three main areas the classifiers cover. 

  • The first is cybersecurity: the system card confirms that Fable 5's classifiers fired on most of the episodes during exploit testing (i.e., can the model assist with hacking?).
  • The second is biology and chemistry: Anthropic's own evaluations showed the underlying model approaching expert-level performance on certain dual-use biological tasks. The system card talks about adeno-associated viruses, which are used for gene therapy, apparently, but also are, well, viruses.
  • The third is distillation: Anthropic has in the past detected large-scale attempts to extract Claude's capabilities to train competing models in authoritarian countries. That extends to building pretraining pipelines or distributed training infrastructure.

Take a look at the before/after story of the safeguards in one visual. The pink bars are Claude Mythos 5 without safeguards — the raw underlying model. It's extremely capable across all four benchmarks: 88.4% on Firefox exploit development, 83.8% on CyberGym vulnerability reproduction, and so on.

The orange bars are Claude Fable 5 with the classifiers active, and they're all at 0.0% across every benchmark. The safeguards flatline the performance, which is good - zero is what they're going for.

Claude Fable 5 Cybersecurity Eval

I like this graph because it puts Opus 4.8 in context. Even without safeguards, Opus 4.8 scores a lot lower than Mythos 5. This shows us why the classifiers were necessary in the first place. The jump in capability between Opus and Mythos is what made broad release risky and delayed.

Claude Fable 5 Access and Pricing

If you're on a Pro, Max, Team, or Enterprise plan, Fable 5 is available to you right now at no extra cost — but only until June 22.

how to access Claude Fable 5

After June 22, access will require usage credits, which is Anthropic's pay-as-you-go system that sits on top of your existing subscription. This is probably part of why Dan Shipper warned us about costs. 

For developers on the API, it's more simple: 

  • Fable 5 is fully available today at $10 per million input tokens 
  • and $50 per million output tokens. 

If you compare against Opus 4.8's standard pricing ($5 input / $25 output), Fable 5 is indeed twice as expensive, as Dan says. But if you're already using Opus 4.8 in fast mode, Fable 5 costs the same while being a more capable model.

Important note, in case you were wondering like I was: When a query falls back to Opus 4.8, as it does with its safety classifier system, you're billed at Opus 4.8 rates rather than Fable 5 rates.

Claude Mythos 5, the Mythos model version without safeguards, is not available to the general public. Access is restricted to Project Glasswing partners.

Conclusion

Anthopic had teased us with Project Glasswing and the tantalizingly named Mythos model. But for months, this best-in-the-world model - and it really is the best in the world - was locked behind a government partnership. 

That’s changed, one week after Anthropic filed for a U.S. IPO. Now, the Mythos-class model is available on a Pro plan. (Access is also based on usage credits. See the last section to get into all that.)

Where we're at: Anthropic has models that are capable enough to cause serious harm without restrictions. We know it's not marketing copy because the company built a whole parallel classifier system to add safeguards.

Where we're going: The capability ceiling keeps moving up, the benchmark tests keep getting replaced with harder versions, and the model release timeline keeps getting shorter. Also, the safeguard infrastructure is getting more and more elaborate to compensate. With the IPO coming possibly this year, there's going to be a lot of talk about how competitive pressure runs against issues of safety.


Josef Waples's photo
Author
Josef Waples

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess! 

Claude Fable FAQs

What is the difference between Claude Fable 5 and Claude Mythos 5?

They share the same underlying model. Fable 5 is the public-facing version with safety classifiers active, routing sensitive cybersecurity and biology queries to Opus 4.8. Mythos 5 has those classifiers lifted and is restricted to Project Glasswing partners and vetted biomedical researchers.

How much does Claude Fable 5 cost?

Via the API, Fable 5 is priced at $10 per million input tokens and $50 per million output tokens — twice Opus 4.8's standard pricing. Pro, Max, Team, and Enterprise subscribers have free access until June 22, after which usage credits are required.

Is Claude Fable 5 available on Claude.ai?

Yes. It's currently available at no extra cost on Pro, Max, Team, and Enterprise plans through June 22, 2026. After that date, access requires usage credits on top of your existing subscription.

What benchmarks does Claude Fable 5 lead?

Fable 5 tops SWE-Bench Pro (80.3%), FrontierCode Diamond, Humanity's Last Exam (64.5% with tools), OSWorld-Verified, GDPval-AA, and Terminal-Bench 2.1 — leading every major benchmark in the comparison table.

What happens when Claude Fable 5 refuses a request?

It doesn't outright refuse — it reroutes the query to Claude Opus 4.8 and notifies the user which model responded. Anthropic estimates this happens in fewer than 5% of queries, though that number varies depending on the type of work.

Topics

Learn with DataCamp

Course

Software Development with Claude Code

4 hr
3.6K
Claude Code brings AI assistance to your terminal. Learn the workflows that turn it into a reliable tool for real software development.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

blog

Claude Mythos 5: Features, Benchmarks, and What It Can Do

Anthropic's most capable model yet, Claude Mythos 5 brings Mythos-class AI to cybersecurity, drug design, and scientific research with the safeguards lifted for trusted partners.
Tom Farnschläder's photo

Tom Farnschläder

11 min

blog

Claude Opus 4.7: Anthropic’s New Best (Available) Model

Explore what's new in Anthropic's latest flagship: stronger agentic coding, sharper vision, and better memory across sessions. Compare the benchmarks against GPT-5.4, Gemini 3.1 Pro, and the locked-away Mythos Preview.
Josef Waples's photo

Josef Waples

9 min

blog

Claude Opus 4.5: Benchmarks, Agents, Tools, and More

Discover Claude Opus 4.5 by Anthropic, its best model yet for coding, agents, and computer use. See benchmark results, new tools, and real-world tests.
Josef Waples's photo

Josef Waples

10 min

blog

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Anthropic’s latest model tops leaderboards in agentic coding and complex reasoning. Plus, it has a 1M context window.
Matt Crabtree's photo

Matt Crabtree

10 min

blog

Anthropic Computer Use: Automate Your Desktop With Claude 3.5

Discover Anthropic’s new computer use feature and let Claude manage your workspace and automate your tasks. Simply type the prompt, and Claude will handle the rest.
Abid Ali Awan's photo

Abid Ali Awan

9 min

Tutorial

Imagine with Claude: A Guide With Practical Examples

Learn how Anthropic's Imagine with Claude introduces a new paradigm for AI-assisted software development, generating functionality on the fly.
François Aubry's photo

François Aubry

See MoreSee More