Claude Fable 5: A Mythos-Class Model You Can Use

Anthropic's Claude Fable 5 is the new state-of-the-art AI model, delivering a clean sweep of every major benchmark including SWE-Bench Pro, FrontierCode Diamond, and Humanity's Last Exam.

Updated Jul 1, 2026 · 10 min read

Explore with AI

Open in ChatGPT Open in Claude Open in Perplexity

Update (July 1, 2026): Access to Claude Fable 5 has been restored, now available globally on Claude Platform, Claude.ai, Claude Code, and Claude Cowork after the export control order was lifted. Mythos 5 remains limited to vetted Project Glasswing partners.

We first started hearing about Claude Mythos back in April when we wrote about the release of Claude Opus 4.7. We understood at the time that Opus 4.7, while extremely impressive in its own right, was only the best publicly available model because the Mythos-class models were being held back from broad release.

Well, Anthropic has now given us Claude Fable 5, a Mythos-class model safe for general use. It’s undoubtedly the new leader in the important benchmark tests, including things like knowledge work, software engineering, and scientific research. It's a big win for Anthropic, which only eight days ago filed for a U.S. IPO.

In this article, we will take a close look at Claude Fable 5, how it compares with other models on the market, like GPT-5.5 and Gemini 3.1 Pro, as well as Claude's own models, like Opus 4.8. We will also offer some guidance on when to use the model because it's more expensive.

What Is Claude Fable 5?

Claude Fable 5 is the name Anthropic is giving to its newest and most capable model. It is a Mythos-class model, but it includes important safeguards in important areas, namely cybersecurity.

For the user, this means that certain questions will be answered by the next-in-line model, Claude Opus 4.8. Anthropic estimates that this kind of rerouting happens in fewer than 5% of all queries, which Anthropic thinks is conservative. That’s just as well, since cybersecurity concerns are involved. Also, of course, that percent is going to vary a lot depending on what kind of work you do.

We are especially excited about Claude Fable 5. The model is so capable now that it has moved into new areas of usefulness: Cyber defenders will use the model to secure critical systems, and life science researchers could use the model to improve new therapeutics. Astrophysicists can use these models to map galaxies. On and on.

If you read the release, you might have heard of Claude Mythos 5 also. If you’re wondering: Claude Mythos 5 is the same underlying model but with some of the safeguards lifted. It’s not for general use: It’s deployed at part of Project Glasswing in partnership with the U.S. government, effectively replacing Claude Mythos Preview. We don’t know as much about Claude Mythos 5, but we do know that it has the strongest cybersecurity capabilities of any model in the world. Anthropic’s partnership with the U.S. government is evidence of that itself.

What's New with Claude Fable 5

Fable 5 introduces several notable upgrades across safety, transparency, memory, and data handling.

Safety classifiers

The biggest structural addition in Fable 5 is a two-stage classifier system designed to prevent misuse of the model's most dangerous capabilities. A probe monitors the model's internal activations across all traffic, and any flagged requests are escalated to a separate trained LLM classifier that makes the final call. When a request is blocked, it's rerouted to Opus 4.8 instead of being outright refused, so the user still gets a useful response.

Fallback notification system

This is similar to the point above. When Fable 5's classifiers reroute a query to Opus 4.8, users are notified which model handled their request. This is a new transparency behavior, so users wouldn’t have to wonder if a response felt different.

File-based memory as a thinking tool

Fable 5 can write notes to a file mid-task and refer back to them later, which is distinct from simply having a long context window. This is an active working memory behavior. Anthropic's own testing showed it improved performance significantly more than it did for Opus 4.8.

30-day data retention policy

Anthropic is introducing a new data retention policy that applies to all Fable 5 traffic. Data is retained for 30 days to help detect novel jailbreaks and reduce false positives. Anthropic says this data won't be used for model training, and human access to it is logged.

Claude Fable 5 Benchmark Results

Claude Fable 5 wins hands-down on all of the benchmark tests that we watch closely. On SWE-Bench Pro, it scores 80.3%, ahead of Opus 4.8's 69.2%. On Humanity's Last Exam with tools, it scores 64.5%, compared to 57.9% for Opus 4.8 and 52.2% for GPT-5.5.

SWE-Bench Pro
Frontier Code (Diamond)
GDPval-AA
OSWorld-Verified
Humanity’s Last Exam
Terminal-Bench 2.1

Category	Benchmark	Claude Mythos 5 / Fable 5	Claude Mythos Preview	Claude Opus 4.8	GPT 5.5	Gemini 3.1 Pro
Agentic coding	SWE-Bench Pro	80.3%	77.8%	69.2%	58.6%	54.2%
Agentic coding	FrontierCode (Diamond)	29.3% (xhigh)	—	13.4% (xhigh)	5.7% (xhigh)	—
Knowledge work	GDPval-AA	1932	—	1890	1769	1314
Knowledge work vision	GDP.pdf	29.8% (no tools)	—	22.5% (no tools)	24.9% (no tools)	16.7% (no tools)
Spatial reasoning	Blueprint-Bench 2	38.6%	—	14.5%	36.2%	26.5%
Tool use	AutomationBench	17.4%	—	15.5%	12.9%	9.6%
Computer use	OSWorld-Verified	85.0%	85.4%	83.4%	78.7%	76.2%
Legal	Legal Agent Benchmark	13.3%	—	10.4%	2.1%	0.0%
Multidisciplinary reasoning	Humanity's Last Exam (no tools)	59.0%*	56.8%	49.8%	41.4%	44.4%
Multidisciplinary reasoning	Humanity's Last Exam (with tools)	64.5%*	64.7%	57.9%	52.2%	51.4%
Biology	BioMysteryBench (hard)	46.1%*	29.6%	40.0%	—	—
Biology	BioMysteryBench (human solved)	83.9%*	82.6%	80.4%	—	—
Agentic coding	Terminal-Bench 2.1	88.0%*	—	82.7%	83.4% (Codex CLI)	70.7% (Gemini CLI)
Cybersecurity	ExploitBench (Cap%)	78.0%*	69.0%	40.0%	34.0%	—
Health	HealthBench Professional	66.0%*	64.7%	56.9%	51.8%	—

Claude Fable 5 for software engineering

I mentioned the FrontierCode benchmark test but didn’t get a chance to go into it. FrontierCode tests if a model can not only pass a coding test, which itself is difficult, but also if it can meet higher, more general standards for high-quality production codebases, which would mean that the code would have to be performant at scale, idiomatic, structured for long-term maintainability, etc.

Claude Fable 5 scores higher on the FrontierCode test than any other model without even using the high or xhigh effort parameters.

Claude Fable 5 for vision

Claude Fable 5 is the new best model for vision tasks. In Anthropic's testing, Fable 5 could rebuild a web application's source code from screenshots alone with no scaffolding. This is the same capability that shows up in its 93.2% score on CharXiv Reasoning with tools.

The Fable 5 model also does extremely well on a problem that more people probably encounter: It can extract precise numbers from detailed scientific figures without making many mistakes.

Claude Fable 5 memory and long-context

We have been talking a lot about AI models’ memory and long-context abilities all year. With Fable 5, Anthropic brings us some new things to talk about.

For one, Fable 4 stays focused across millions of tokens. If you’re not familiar with the idea, AI models, when doing long tasks, can ‘forget’ or lose track of earlier context as the conversation gets longer and longer. But Fable 5 is better at maintaining coherent, consistent behavior even when the task is very long. How long is very long? For context, a million tokens is a 700-page novel.

Anthropic also tells us, interestingly, that Fable 5 improves its outputs using its own notes. So, the model can write things down mid-task (to a file, as an example) and refer back to them (the things it wrote down) later. It’s akin to a person jotting notes while working through a problem. This is different from having a long memory. T model uses external storage as a thinking tool.

Claude Fable 5 vs. GPT-5.5 vs. Gemini 3.1 Pro

The table above already shows Fable 5 winning in every category. But one useful question is the shape of each lead: where it's a chasm, where it's a coin flip, and why.

Claude Fable 5 vs. GPT-5.5

GPT-5.5 is the closest competitor, and on most tasks the two are nearly tied. Blueprint-Bench 2 spatial reasoning is 36.2 for GPT-5.5 and 38.6 for Fable 5. They stay close even on long agentic tasks, as long as the model gets feedback as it works. On Terminal-Bench 2.1, where the terminal flags errors right away, it's 83.4 to 88.0.

One situation breaks the pattern: a long task that's graded only at the end. There, the model can't catch its own mistakes, so they pile up, and the stronger model pulls ahead. SWE-Bench Pro is the clearest case, at 58.6 for GPT-5.5 against 80.3 for Fable 5.

For a detailed comparison, read our post: Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose.

Claude Fable 5 vs. Gemini 3.1 Pro

Gemini 3.1 Pro trails Fable 5 across the board, and the gap widens on the work that matters most. On GDPval-AA knowledge work, it scores 1314 to Fable 5's 1932. On Terminal-Bench 2.1, it's 70.7 to 88.0. The gap is smaller on shorter, contained tasks — spatial reasoning is 26.5 to 38.6 — but it never closes. The pattern points to a model that handles short problems decently but loses the thread on the long, tool-heavy workflows where Fable 5 is built to live.

Testing Claude Fable 5

One of the specific claims in Anthropic's release is that Fable 5 can extract precise numbers from detailed scientific figures without making many mistakes.

To put this to the test, we created a two-panel scientific figure modeled on the kind of output you'd find in a cell biology paper: overlapping gene expression time series on a log₂ axis, a dual-axis panel pairing percentages with other measurements in arbitrary units, etc.

We thought this would be tricky. The dual y-axes in panel B use different scales and different units. Gene Z crosses into negative log₂ territory in certain places.

We asked Fable 5 to extract all numeric values from both panels and the table.

please extract all numeric values

The result was basically perfect. My own table used to create the graph had Gene Z with a value of -0.05 at 48 hours. Fable 5 told me it was -0.04 at this point, not -0.05.

What People Are Saying About Claude Fable 5

In the release, Anthropic shows us early access feedback/product testimonials for customers or partners, like GitHub and Figma. They say positive things, but that’s expected. Every Anthropic model release has this section.

X gives us a less curated opinion. What jumped out immediately, Andrej Karpathy has given the model a basically glowing review, recognizing that Fable 5 is state-of-the-art on all relevant benchmarks by a margin, and also calling the model a step-change like Opus 4.5 was in November. In the rest of the post, he gives plenty of other ideas that you can do with Fable 5, like:

Bespoke single-use apps
Auto-optimize your code
Run research projects with custom HTML for the results

Dan Shipper, the CEO of Every, says similar stuff, that Fable 5 blew away the benchmark tests, for one, and also that it’s a ‘one-shot wonder’ - you can set it to a big project and come back the next morning and have completed work. (He had early access, which is how he knows.)

There is a bit of criticism to balance things out: He says the model is slow and token-hungry, which is another way of saying expensive. He says it’s twice as expensive as Opus. So he acknowledges, smartly, that you should only use the model for stuff that requires it.

What Are Claude Fable 5's Guardrails?

I mentioned that Claude Fable 5 is Claude Mythos 5, but with safeguards. What that means is that Fable 5 is powerful enough that, without restrictions, it could provide meaningful assistance to bad actors in areas like cybersecurity and biology. That’s the belief.

Here is how the safeguards work at a high level. It's a two-stage classifier system:

First, a probe monitors Claude's internal activations across all traffic, and any flagged requests are escalated to a separate trained LLM classifier that makes the final call on whether to block or allow the response.
Second, when a request is blocked, it gets rerouted to Claude Opus 4.8 instead, and the user is notified which model handled their query.

Now, there are three main areas the classifiers cover.

The first is cybersecurity: the system card confirms that Fable 5's classifiers fired on most of the episodes during exploit testing (i.e., can the model assist with hacking?).
The second is biology and chemistry: Anthropic's own evaluations showed the underlying model approaching expert-level performance on certain dual-use biological tasks. The system card talks about adeno-associated viruses, which are used for gene therapy, apparently, but also are, well, viruses.
The third is distillation: Anthropic has, in the past, detected large-scale attempts to extract Claude's capabilities to train competing models in authoritarian countries. That extends to building pretraining pipelines or distributed training infrastructure.

Take a look at the before/after story of the safeguards in one visual. The pink bars are Claude Mythos 5 without safeguards — the raw underlying model. It's extremely capable across all four benchmarks: 88.4% on Firefox exploit development, 83.8% on CyberGym vulnerability reproduction, and so on.

The orange bars are Claude Fable 5 with the classifiers active, and they're all at 0.0% across every benchmark. The safeguards flatline the performance, which is good - zero is what they're going for.

I like this graph because it puts Opus 4.8 in context. Even without safeguards, Opus 4.8 scores a lot lower than Mythos 5. This shows us why the classifiers were necessary in the first place. The jump in capability between Opus and Mythos is what made broad release risky and delayed.

Claude Fable 5 Access and Pricing

If you're on a Pro, Max, Team, or Enterprise plan, Fable 5 is available to you right now at no extra cost — but only until June 22.

After June 22, access will require usage credits, which is Anthropic's pay-as-you-go system that sits on top of your existing subscription. This is probably part of why Dan Shipper warned us about costs.

For developers on the API, it's simpler:

Fable 5 is fully available today at $10 per million input tokens
and $50 per million output tokens.

If you compare against Opus 4.8's standard pricing ($5 input / $25 output), Fable 5 is indeed twice as expensive, as Dan says. But if you're already using Opus 4.8 in fast mode, Fable 5 costs the same while being a more capable model.

Important note, in case you were wondering like I was: When a query falls back to Opus 4.8, as it does with its safety classifier system, you're billed at Opus 4.8 rates rather than Fable 5 rates.

Claude Mythos 5, the Mythos model version without safeguards, is not available to the general public. Access is restricted to Project Glasswing partners.

Conclusion

Anthopic had teased us with Project Glasswing and the tantalizingly named Mythos model. But for months, this best-in-the-world model - and it really is the best in the world - was locked behind a government partnership.

That’s changed, one week after Anthropic filed for a U.S. IPO. Now, the Mythos-class model is available on a Pro plan. (Access is also based on usage credits. See the last section to get into all that.)

Where we're at: Anthropic has models that are capable enough to cause serious harm without restrictions. We know it's not marketing copy because the company built a whole parallel classifier system to add safeguards.

Where we're going: The capability ceiling keeps moving up, the benchmark tests keep getting replaced with harder versions, and the model release timeline keeps getting shorter. Also, the safeguard infrastructure is getting more and more elaborate to compensate. With the IPO coming possibly this year, there's going to be a lot of talk about how competitive pressure runs against issues of safety.

Author

Josef Waples

What is the difference between Claude Fable 5 and Claude Mythos 5?

How much does Claude Fable 5 cost?

Is Claude Fable 5 available on Claude.ai?

What benchmarks does Claude Fable 5 lead?

What happens when Claude Fable 5 refuses a request?

Topics

Artificial Intelligence

Learn with DataCamp

Course

Software Development with Claude Code

4 hr

5.2K

Claude Code brings AI assistance to your terminal. Learn the workflows that turn it into a reliable tool for real software development.

See Details

Start Course

Course

Introduction to Agent Skills

2 hr 30 min

3.1K

Learn how to build, configure, and share Skills in Claude Code — reusable markdown instructions that Claude automatically applies to tasks at the right time.

See Details

Start Course

Course

Claude Code 101

3 hr

20.5K

Learn how to use Claude Code effectively in your daily development workflows.

See Details

Start Course

blog

Claude Mythos 5: Features, Benchmarks, and What It Can Do

Anthropic's most capable model yet, Claude Mythos 5 brings Mythos-class AI to cybersecurity, drug design, and scientific research with the safeguards lifted for trusted partners.

Tom Farnschläder

11 min

blog

Claude Opus 4.7: Anthropic’s New Best (Available) Model

Explore what's new in Anthropic's latest flagship: stronger agentic coding, sharper vision, and better memory across sessions. Compare the benchmarks against GPT-5.4, Gemini 3.1 Pro, and the locked-away Mythos Preview.

Josef Waples

9 min

blog

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Fable 5 leads on raw capability benchmarks, but GPT-5.5 wins on access, pricing, and fewer classifier interruptions. Here's how to choose.

Tom Farnschläder

11 min

blog

Claude Fable 5 vs. Gemini 3.5 Flash: Benchmarks, Pricing, and More

Claude Fable 5 dominates on raw capability, but Gemini 3.5 Flash delivers near-frontier performance at a fraction of the cost and several times the speed. Keep reading to learn more.

Josef Waples

9 min

blog

Claude Opus 4.5: Benchmarks, Agents, Tools, and More

Discover Claude Opus 4.5 by Anthropic, its best model yet for coding, agents, and computer use. See benchmark results, new tools, and real-world tests.

Josef Waples

10 min

blog

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Anthropic’s latest model tops leaderboards in agentic coding and complex reasoning. Plus, it has a 1M context window.

Matt Crabtree

10 min

See More See More

What Is Claude Fable 5?

What's New with Claude Fable 5

Safety classifiers

Fallback notification system

File-based memory as a thinking tool

30-day data retention policy

Claude Fable 5 Benchmark Results

Claude Fable 5 for software engineering

Claude Fable 5 for vision

Claude Fable 5 memory and long-context

Claude Fable 5 vs. GPT-5.5 vs. Gemini 3.1 Pro

Claude Fable 5 vs. GPT-5.5

Claude Fable 5 vs. Gemini 3.1 Pro

Testing Claude Fable 5

What People Are Saying About Claude Fable 5

What Are Claude Fable 5's Guardrails?

Claude Fable 5 Access and Pricing

Conclusion

Claude Fable FAQs

Is Claude Fable 5 available on Claude.ai?

What benchmarks does Claude Fable 5 lead?

What happens when Claude Fable 5 refuses a request?

Claude Mythos 5: Features, Benchmarks, and What It Can Do

Claude Opus 4.7: Anthropic’s New Best (Available) Model

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Fable 5 vs. Gemini 3.5 Flash: Benchmarks, Pricing, and More

Claude Opus 4.5: Benchmarks, Agents, Tools, and More

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Software Development with Claude Code

Introduction to Agent Skills

Claude Code 101

Claude Mythos 5: Features, Benchmarks, and What It Can Do

Claude Opus 4.7: Anthropic’s New Best (Available) Model

Claude Fable 5 vs GPT-5.5: Benchmarks, Pricing, and Which to Choose

Claude Fable 5 vs. Gemini 3.5 Flash: Benchmarks, Pricing, and More

Claude Opus 4.5: Benchmarks, Agents, Tools, and More

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Software Development with Claude Code