Claude Mythos 5: Features, Benchmarks, and What It Can Do

Anthropic's most capable model yet, Claude Mythos 5 brings Mythos-class AI to cybersecurity, drug design, and scientific research with the safeguards lifted for trusted partners.

更新 2026年7月1日 · 11分钟读

用 AI 探索

在 ChatGPT 中打开在 Claude 中打开在 Perplexity 中打开

Update (July 1, 2026): Access to Claude Fable 5 has been restored, now available globally on Claude Platform, Claude.ai, Claude Code, and Claude Cowork after the export control order was lifted. Mythos 5 remains limited to vetted Project Glasswing partners.

Anthropic launched two models on June 9, 2026: Claude Fable 5, the public-facing Mythos-class model with conservative safety guardrails, and Claude Mythos 5, the same underlying model with those guardrails lifted for a select group of trusted partners. This article is about Mythos 5, the version Anthropic describes as having "the strongest cybersecurity capabilities of any model in the world."

In this article, I'll cover what Claude Mythos 5 is, what it can do across software engineering, life sciences, and scientific research, how it performs on benchmarks, and who can access it. You can also check out our coverage of Claude Opus 4.8 for context on where Mythos 5 sits relative to Anthropic's broader model family.

Stay up to date with the latest in all things AI. Subscribe to The Median, our free Friday newsletter that breaks down the week's key stories. Stay sharp in just a few minutes a week.

What Is Claude Mythos 5?

Claude Mythos 5 is Anthropic's highest-capability model, sitting above the Opus class in what Anthropic calls the Mythos tier. The first Mythos-class model, Claude Mythos Preview, was released in April 2026 through Project Glasswing, a collaboration with the US Government focused on cybersecurity. Mythos 5 is the second release in this tier and a direct upgrade to Mythos Preview.

Mythos 5 and Fable 5 share the same underlying architecture. The difference is the safeguards: Fable 5 ships with classifiers that route sensitive cybersecurity and biology queries to Claude Opus 4.8 instead. Mythos 5 has those classifiers lifted in specific areas for partners who have been vetted through the trusted access program. Anthropic is explicit that the name difference reflects the safeguard difference, not a capability difference.

The headline benchmark claim is that Mythos 5 scores 80.3% on SWE-bench Pro, compared to 77.8% for Mythos Preview and 69.2% for Opus 4.8. On Humanity's Last Exam with tools, it scores 64.5%, ahead of Opus 4.8's 57.9% and GPT-5.5's 52.2%. These are not marginal improvements over the Opus class.

Introduction to Claude Models

Learn how to work with Claude using the Anthropic API to solve real-world tasks and build AI-powered applications.

Explore Course

What's New With Claude Mythos 5?

Mythos 5 represents a step up from Mythos Preview across every major capability area Anthropic has tested. The gains are most visible in long-horizon autonomous work, especially in scientific domains of scientific reasoning and vision tasks. Here's what that looks like in practice.

Secure autonomous software engineering at scale

Mythos 5 can work autonomously on large codebases for longer than any previous Claude model. Stripe reported the model compressed months of engineering work into days, completing a codebase-wide migration across a 50-million-line Ruby codebase in a single day. On FrontierCode (Diamond), it scores highest among frontier models even at medium effort.

For security work, Mythos 5 extends the capabilities that made Mythos Preview valuable to Project Glasswing partners. Those partners used Mythos Preview to identify over 10,000 high and critical security flaws across production systems.

Drug design and protein engineering

Anthropic's internal protein design team used Mythos 5 to accelerate drug design by roughly ten times. In a controlled comparison, Mythos 5 matched or beat skilled human operators across 14 protein targets for the full pipeline:

choosing binding sites
selecting tools
recovering from failures

Nine yielded strong drug design candidates currently under investigation.

Novel scientific hypothesis generation

Mythos 5 is Anthropic's first model to consistently produce novel scientific hypotheses rather than summarize existing literature. In blinded comparisons, Anthropic's scientists preferred its molecular biology hypotheses roughly 80% of the time, and several have been advanced to experimental evaluation. One hypothesis about a novel E. coli protein mechanism was independently corroborated by a lab working on the same problem.

Autonomous genomics research

Mythos 5 conducted novel genomics research over more than a week of largely autonomous work, assembling single-cell data for millions of cells across 138 animal species and training a custom ML model to identify equivalent cell types across distantly related organisms. The trained model outperformed a recent Science-published model despite being 100 times smaller.

Vision and long-context performance

Mythos 5 scores 93.2% on CharXiv Reasoning with tools and can extract precise numbers from detailed scientific figures or rebuild a web app from screenshots alone. On long-context tasks, giving Mythos 5 file-based memory improved its performance three times more than the same setup improved Opus 4.8, and it reached the final act of Slay the Spire three times more often.

Claude Mythos 5 Benchmarks

Mythos 5 leads or ties on nearly every benchmark Anthropic tested, with gains over Opus 4.8 that are consistent across categories rather than concentrated in one area. The comparison table pits it against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

Category	Benchmark	Claude Mythos 5 / Fable 5	Claude Mythos Preview	Claude Opus 4.8	GPT 5.5	Gemini 3.1 Pro
Agentic coding	SWE-Bench Pro	80.3%	77.8%	69.2%	58.6%	54.2%
Agentic coding	FrontierCode (Diamond)	29.3% (xhigh)	—	13.4% (xhigh)	5.7% (xhigh)	—
Knowledge work	GDPval-AA	1932	—	1890	1769	1314
Knowledge work vision	GDP.pdf	29.8% (no tools)	—	22.5% (no tools)	24.9% (no tools)	16.7% (no tools)
Spatial reasoning	Blueprint-Bench 2	38.6%	—	14.5%	36.2%	26.5%
Tool use	AutomationBench	17.4%	—	15.5%	12.9%	9.6%
Computer use	OSWorld-Verified	85.0%	85.4%	83.4%	78.7%	76.2%
Legal	Legal Agent Benchmark	13.3%	—	10.4%	2.1%	0.0%
Multidisciplinary reasoning	Humanity's Last Exam (no tools)	59.0%*	56.8%	49.8%	41.4%	44.4%
Multidisciplinary reasoning	Humanity's Last Exam (with tools)	64.5%*	64.7%	57.9%	52.2%	51.4%
Biology	BioMysteryBench (hard)	46.1%*	29.6%	40.0%	—	—
Biology	BioMysteryBench (human solved)	83.9%*	82.6%	80.4%	—	—
Agentic coding	Terminal-Bench 2.1	88.0%*	—	82.7%	83.4% (Codex CLI)	70.7% (Gemini CLI)
Cybersecurity	ExploitBench (Cap%)	78.0%*	69.0%	40.0%	34.0%	—
Health	HealthBench Professional	66.0%*	64.7%	56.9%	51.8%	—

Anthropic reports Mythos 5 and Fable 5 scores together, noting they fall within 1–3 percentage points in most cases. Benchmarks marked with an asterisk (*) show a larger gap because Fable 5's safety classifiers route sensitive queries to Opus 4.8; on those benchmarks, Fable 5 performs closer to the Opus class.

Agentic coding: SWE-Bench Pro, FrontierCode, and Terminal-Bench 2.1

On SWE-Bench Pro, Mythos 5 scores 80.3%, compared to 77.8% for Mythos Preview, 69.2% for Opus 4.8, 58.6% for GPT 5.5, and 54.2% for Gemini 3.1 Pro. The 11-point gap over Opus 4.8 is substantial on a benchmark designed to resist ground-truth leakage.

For high-quality and maintainable agentic code rather than raw task completion, measured by FrontierCode (Diamond), the separation is even sharper. Mythos 5 scores 29.3% at the xhigh effort level, compared to 13.4% for Opus 4.8 and 5.7% for GPT 5.5.

For terminal work, Mythos 5 takes the crown back to Anthropic from OpenAI: Mythos 5 scores 88.0%* on Terminal-Bench 2.1, compared to 82.7% for Opus 4.8, 83.4% for GPT 5.5 (Codex CLI), and 70.7% for Gemini 3.1 Pro (with Gemini CLI).

Knowledge work: GDPval-AA and GDPpdf

GDPval-AA measures knowledge work performance on a numerical scale. Mythos 5 scores 1932, compared to 1890 for Opus 4.8, 1769 for GPT 5.5, and 1314 for Gemini 3.1 Pro.

This gap is elevated for knowledge work on PDF documents without tool access. Mythos 5 scores 29.8% on GDPpdf, compared to 22.5% for Opus 4.8, 24.9% for GPT 5.5, and 16.7% for Gemini 3.1 Pro.

Multidisciplinary reasoning: Humanity's Last Exam

Humanity's Last Exam (HLE) tests graduate-level reasoning across science, mathematics, and humanities. Mythos 5 scores 59.0%* without tools and 64.5%* with tools. Mythos Preview scores 56.8% and 64.7% respectively—essentially tied with tools but trailing by 2 points without. The distance to Opus 4.8 is already significant (49.8% without, 57.9% with), but even bigger to the flagship competitor models (GPT 5.5: 41.4% and 52.2%, Gemini 3.1 Pro: 44.4% and 51.4%).

The gap between Mythos 5 and the rest of the field is clearest in the no-tools condition, where it leads Opus 4.8 by over 9 points. These are starred scores, meaning Fable 5 performs somewhat lower due to its safety classifiers.

Computer use, tool use, and spatial reasoning

On OSWorld-Verified, which tests the model's ability to complete tasks on a real computer interface, Mythos 5 scores 85.0%. Mythos Preview edges ahead at 85.4%, making this the only benchmark where Mythos Preview leads. Opus 4.8 comes quite close (83.4%), with the competitors falling behind a bit. GPT 5.5 scores 78.7%, and Gemini 3.1 Pro scores 76.2%.

AutomationBench measures tool use capabilities. Mythos 5 scores 17.4%, compared to 15.5% for Opus 4.8, 12.9% for GPT 5.5, and 9.6% for Gemini 3.1 Pro. The low absolute numbers across the board suggest tool use remains a hard problem for all frontier models.

Spatial reasoning is one area where Mythos 5's lead is the biggest. It scores 38.6% in Blueprint-Bench 2, more than double that of Opus 4.8's 14.5%. GPT 5.5 is closer at 36.2%, and Gemini 3.1 Pro scores 26.5%.

Cybersecurity and biology

Those were the two areas that arguably received the most attention in the release notes, and the results show us why.

ExploitBench measures the fraction of exploits the model can successfully reproduce (Cap%). Mythos 5 scores 78.0%*, which is even a significant improvement from Mythos Preview (69.0%), and a dramatic increase compared to Opus 4.8's 40.0% for Opus 4.8 and 34.0% for GPT 5.5.

The 38-point gap over Opus 4.8 is the largest single-benchmark lead in the comparison table, and it explains why the cyber safeguards exist for Fable 5. Anthropic's external red-teaming found no universal jailbreaks on long-form agentic tasks, though the UK AISI made progress toward one in an initial testing window.

BioMysteryBench tests biological reasoning at two difficulty levels. On the hard subset, Mythos 5 scores 46.1%*, compared to 29.6% for Mythos Preview and 40.0% for Opus 4.8. On the human-solved subset, Mythos 5 scores 83.9%*, Mythos Preview scores 82.6%, and Opus 4.8 scores 80.4%. GPT 5.5 and Gemini 3.1 Pro do not have reported scores on either subset.

As with ExploitBench, Fable 5's scores are closer to Opus 4.8 due to biology-related safety classifiers.

Health and legal

Claude Mythos 5 demonstrates notable strength in two high-stakes professional domains where accuracy and reasoning quality carry real-world consequences: medicine and law.

In HealthBench Professional, Mythos 5 scores 66.0%*, just over Mythos Preview's 64.7%. Opus 4.8 scores 56.9%, and GPT 5.5 scores 51.8%.

On the Legal Agent Benchmark, Mythos 5 scores 13.3%, compared to 10.4% for Opus 4.8, and only 2.1% for GPT 5.5. The absolute scores are low, but the separation between Mythos 5 and GPT 5.5 or Gemini is stark. Legal reasoning remains a challenging frontier for all models.

Claude Mythos 5 Pricing and Availability

Claude Mythos 5 is priced at $10 per million input tokens and $50 per million output tokens. This is less than half the price of Claude Mythos Preview ($25/$125), which makes the upgrade straightforward for existing Glasswing partners. Developers can access the model via the Claude API using the model ID claude-mythos-5.

Access is currently restricted to two groups:

All users who had access to Claude Mythos Preview through Project Glasswing can upgrade to Mythos 5 with cyber safeguards lifted
A small group of biomedical researchers that can access Mythos 5 with biology and chemistry safeguards lifted, but cyber safeguards still in place.

Anthropic plans to expand both programs over time.

A broader trusted access program is planned for cybersecurity organizations to apply more systematically, in consultation with the US Government. Anthropic has not announced a timeline for general availability. For most developers, Claude Fable 5 is the practical option today, with the same underlying model and access via standard subscription and API plans.

One operational detail worth flagging: Anthropic has introduced a 30-day data retention policy for all Mythos-class model traffic. The data is not used for training and is deleted after 30 days in almost all cases, but it is retained for safety monitoring. If you're building on Mythos 5 with sensitive data, review Anthropic's support documentation on this policy before deploying.

Final Thoughts

Claude Mythos 5 is Anthropic's clearest statement yet that the company is serious about deploying frontier AI in high-stakes professional contexts, and the results back it up.

The SWE-bench Pro gap (80.3% vs 69.2%), the Terminal-Bench 2.1 gap (88.0% vs 82.7%), and the ExploitBench gap (78.0% vs 40.0%) all point to a model that handles the hardest tasks more reliably than anything else available.

The restricted access model is a reasonable approach given the dual-use risks, and the ExploitBench scores make a compelling case that the most capable offensive security tools shouldn't be publicly available. The harder question is whether Anthropic can expand the trusted access program fast enough to be useful to the broader security and biomedical research communities before competitors close the gap.

For organizations that qualify, the upgrade from Mythos Preview is straightforward at less than half the price.

What is the difference between Claude Mythos 5 and Claude Fable 5?

Who can access Claude Mythos 5?

How does Claude Mythos 5 compare to GPT-5.5 and Gemini 3.1 Pro?

Is Claude Mythos 5 safe to use with sensitive data?

What does the 30-day data retention policy mean for Mythos 5 users?

Author

Tom Farnschläder

主题

Artificial Intelligence

AI News

Large Language Models

Top AI Courses

Tracks

AI 基础知识

10小时

探索 AI 基础，学习如何在工作中有效利用 AI，并深入了解 ChatGPT 等模型，以驾驭快速变化的 AI 领域。

查看详情

开始课程

Courses

Claude 模型入门

3小时

12.3K

学习如何通过 Anthropic API 使用 Claude，解决实际任务并构建 AI 驱动的应用。

查看详情

开始课程

Courses

AI for Work 入门

2小时

96.6K

了解 AI 是什么，以及如何负责任地使用它，以实现更智能、更高效的工作！

查看详情

开始课程

有关的

blogs

Claude Fable 5: A Mythos-Class Model You Can Use

Anthropic's Claude Fable 5 is the new state-of-the-art AI model, delivering a clean sweep of every major benchmark including SWE-Bench Pro, FrontierCode Diamond, and Humanity's Last Exam.

Josef Waples

10分钟

blogs

Claude Opus 4.7: Anthropic’s New Best (Available) Model

Explore what's new in Anthropic's latest flagship: stronger agentic coding, sharper vision, and better memory across sessions. Compare the benchmarks against GPT-5.4, Gemini 3.1 Pro, and the locked-away Mythos Preview.

Josef Waples

9分钟

blogs

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Anthropic’s latest model tops leaderboards in agentic coding and complex reasoning. Plus, it has a 1M context window.

Matt Crabtree

10分钟

blogs

Claude Opus 4.5: Benchmarks, Agents, Tools, and More

Discover Claude Opus 4.5 by Anthropic, its best model yet for coding, agents, and computer use. See benchmark results, new tools, and real-world tests.

Josef Waples

10分钟

blogs

Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks

Explore Anthropic’s Claude Sonnet 4.6, featuring a 1M token context window, near-Opus performance, and advanced agentic capabilities for coding and finance.

Tom Farnschläder

10分钟

blogs

Claude Sonnet 5: Features, Benchmarks, Pricing, and More

Claude Sonnet 5 nears Opus 4.8 on agentic benchmarks at lower cost. Discover its features, benchmarks, pricing, and more.

Matt Crabtree

9分钟

查看更多查看更多

What Is Claude Mythos 5?

Introduction to Claude Models

What's New With Claude Mythos 5?

Secure autonomous software engineering at scale

Drug design and protein engineering

Novel scientific hypothesis generation

Autonomous genomics research

Vision and long-context performance

Claude Mythos 5 Benchmarks

Agentic coding: SWE-Bench Pro, FrontierCode, and Terminal-Bench 2.1

Knowledge work: GDPval-AA and GDPpdf

Multidisciplinary reasoning: Humanity's Last Exam

Computer use, tool use, and spatial reasoning

Cybersecurity and biology

Health and legal

Claude Mythos 5 Pricing and Availability

Final Thoughts

Claude Mythos 5 FAQs

How does Claude Mythos 5 compare to GPT-5.5 and Gemini 3.1 Pro?

Is Claude Mythos 5 safe to use with sensitive data?

What does the 30-day data retention policy mean for Mythos 5 users?

Claude Fable 5: A Mythos-Class Model You Can Use

Claude Opus 4.7: Anthropic’s New Best (Available) Model

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Claude Opus 4.5: Benchmarks, Agents, Tools, and More

Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks

Claude Sonnet 5: Features, Benchmarks, Pricing, and More

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}AI 基础知识

Claude 模型入门

AI for Work 入门

Claude Fable 5: A Mythos-Class Model You Can Use

Claude Opus 4.7: Anthropic’s New Best (Available) Model

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Claude Opus 4.5: Benchmarks, Agents, Tools, and More

Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks

Claude Sonnet 5: Features, Benchmarks, Pricing, and More

AI 基础知识