跳至内容

Claude Mythos 5: Features, Benchmarks, and What It Can Do

Anthropic's most capable model yet, Claude Mythos 5 brings Mythos-class AI to cybersecurity, drug design, and scientific research with the safeguards lifted for trusted partners.
更新 2026年6月9日  · 11分钟

Anthropic launched two models on June 9, 2026: Claude Fable 5, the public-facing Mythos-class model with conservative safety guardrails, and Claude Mythos 5, the same underlying model with those guardrails lifted for a select group of trusted partners. This article is about Mythos 5, the version Anthropic describes as having "the strongest cybersecurity capabilities of any model in the world."

Mythos 5 is priced at $10 per million input tokens and $50 per million output tokens, less than half the cost of Claude Mythos Preview. It is currently restricted to Project Glasswing cybersecurity partners and a small group of biomedical researchers, with a broader trusted access program planned.

In this article, I'll cover what Claude Mythos 5 is, what it can do across software engineering, life sciences, and scientific research, how it performs on benchmarks, and who can access it. You can also check out our coverage of Claude Opus 4.8 for context on where Mythos 5 sits relative to Anthropic's broader model family.

If you want to get up to speed with the AI landscape and understand where models like Mythos 5 fit, I'd recommend starting with our AI Fundamentals skill track.

What Is Claude Mythos 5?

Claude Mythos 5 is Anthropic's highest-capability model, sitting above the Opus class in what Anthropic calls the Mythos tier. The first Mythos-class model, Claude Mythos Preview, was released in April 2026 through Project Glasswing, a collaboration with the US Government focused on cybersecurity. Mythos 5 is the second release in this tier and a direct upgrade to Mythos Preview.

Mythos 5 and Fable 5 share the same underlying architecture. The difference is the safeguards: Fable 5 ships with classifiers that route sensitive cybersecurity and biology queries to Claude Opus 4.8 instead. Mythos 5 has those classifiers lifted in specific areas for partners who have been vetted through the trusted access program. Anthropic is explicit that the name difference reflects the safeguard difference, not a capability difference.

The headline benchmark claim is that Mythos 5 scores 80.3% on SWE-bench Pro, compared to 77.8% for Mythos Preview and 69.2% for Opus 4.8. On Humanity's Last Exam with tools, it scores 64.5%, ahead of Opus 4.8's 57.9% and GPT-5.5's 52.2%. These are not marginal improvements over the Opus class.

Introduction to Claude Models

Learn how to work with Claude using the Anthropic API to solve real-world tasks and build AI-powered applications.
Explore Course

What's New With Claude Mythos 5?

Mythos 5 represents a step up from Mythos Preview across every major capability area Anthropic has tested. The gains are most visible in long-horizon autonomous work, especially in scientific domains of scientific reasoning and vision tasks. Here's what that looks like in practice.

Autonomous software engineering at scale

Mythos 5 can work autonomously on large codebases for longer than any previous Claude model. Stripe reported during early testing that the model compressed months of engineering work into days, completing a codebase-wide migration across a 50-million-line Ruby codebase in a single day. That migration would have taken a full engineering team over two months by hand.

On Cognition's FrontierCode evaluation, which tests high-quality and maintainable agentic coding rather than just raw task completion, Mythos 5 scores highest among frontier models even at medium effort. The token efficiency gain matters here: getting to a correct, maintainable solution in fewer tokens reduces both cost and latency for production workflows.

For security-focused engineering, Mythos 5 inherits and extends the capabilities that made Mythos Preview valuable to Project Glasswing partners. Those partners used Mythos Preview to identify over 10,000 high and critical security flaws across production systems. Mythos 5 brings stronger agentic hacking capabilities, which is precisely why the cyber safeguards exist for the public Fable 5 version.

Drug design and protein engineering

Anthropic's internal protein design team used Mythos 5 to accelerate aspects of the drug design process by roughly ten times. In a controlled comparison, Mythos 5 with protein design and bioinformatics tools but no human assistance matched or beat skilled human operators on the full pipeline: choosing binding sites, selecting and running protein design tools, and recovering from failures.

Across 14 protein targets spanning immune checkpoints, growth-factor signaling, neurodegeneration, and muscle disease, nine yielded strong drug design candidates that Anthropic is currently investigating. The model handled the complete scientific workflow autonomously, not just individual steps.

This capability is dual-use, which is why biology access in Mythos 5 is gated separately from cyber access. Anthropic is opening a trusted access program for biomedical researchers that lifts the biology and chemistry safeguards while keeping the cyber safeguards in place.

Novel scientific hypothesis generation

Mythos 5 is Anthropic's first model to consistently produce novel, compelling scientific hypotheses rather than just summarizing existing literature. In blinded head-to-head comparisons against Opus-class models, Anthropic's scientists preferred Mythos 5's molecular biology hypotheses roughly 80% of the time, and several have been advanced to experimental evaluation.

One hypothesis about a novel mechanism for an E. coli protein was independently corroborated by a lab working on the same problem, published on bioRxiv in March 2026. That's a meaningful signal: the model generated a hypothesis that held up against independent experimental work, not just internal review.

Autonomous genomics research

In one of the more striking capability demonstrations, Mythos 5 conducted novel genomics research over more than a week of largely autonomous work. It assembled single-cell data for millions of cells across 138 animal species, then designed and trained a custom machine learning model to identify cells performing equivalent roles across distantly related organisms.

The trained model outperformed a recent model published in the journal Science, despite being 100 times smaller. Anthropic plans to publish these results. The key detail here is the autonomy: the model received only high-level human input and handled the full research pipeline, from data assembly through model design and training.

Vision and long-context performance

Mythos 5 scores 93.2% on CharXiv Reasoning with tools, compared to 91.0% for Opus 4.8. According to Anthropic, it can extract precise numbers from detailed scientific figures and rebuild a web application's source code from screenshots alone, with no additional scaffolding.

On long-context tasks, Mythos 5 stays focused across millions of tokens and uses its own notes to improve outputs over time. When tested on the deck-building game Slay the Spire with persistent file-based memory, giving Mythos 5 access to memory improved its performance three times more than the same setup improved Opus 4.8's performance. It also reached the game's final act three times more often than Opus 4.8.

Claude Mythos 5 Benchmarks

Mythos 5 leads or ties on nearly every benchmark Anthropic tested, with gains over Opus 4.8 that are consistent across categories rather than concentrated in one area. The comparison table pits it against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

Anthropic reports Mythos 5 and Fable 5 scores together, noting they fall within 1–3 percentage points in most cases. Benchmarks marked with an asterisk (*) show a larger gap because Fable 5's safety classifiers route sensitive queries to Opus 4.8; on those benchmarks, Fable 5 performs closer to the Opus class.

Agentic coding: SWE-Bench Pro, FrontierCode, and Terminal-Bench 2.1

On SWE-Bench Pro, Mythos 5 scores 80.3%, compared to 77.8% for Mythos Preview, 69.2% for Opus 4.8, 58.6% for GPT 5.5, and 54.2% for Gemini 3.1 Pro. The 11-point gap over Opus 4.8 is substantial on a benchmark designed to resist ground-truth leakage.

For high-quality and maintainable agentic code rather than raw task completion, measured by FrontierCode (Diamond), the separation is even sharper. Mythos 5 scores 29.3% at the xhigh effort level, compared to 13.4% for Opus 4.8 and 5.7% for GPT 5.5.

For terminal work, Mythos 5 takes the crown back to Anthropic from OpenAI: Mythos 5 scores 88.0%* on Terminal-Bench 2.1, compared to 82.7% for Opus 4.8, 83.4% for GPT 5.5 (Codex CLI), and 70.7% for Gemini 3.1 Pro (with Gemini CLI).

Knowledge work: GDPval-AA and GDPpdf

GDPval-AA measures knowledge work performance on a numerical scale. Mythos 5 scores 1932, compared to 1890 for Opus 4.8, 1769 for GPT 5.5, and 1314 for Gemini 3.1 Pro.

This gap is elevated for knowledge work on PDF documents without tool access.  Mythos 5 scores 29.8% on GDPpdf, compared to 22.5% for Opus 4.8, 24.9% for GPT 5.5, and 16.7% for Gemini 3.1 Pro.

Multidisciplinary reasoning: Humanity's Last Exam

Humanity's Last Exam (HLE) tests graduate-level reasoning across science, mathematics, and humanities. Mythos 5 scores 59.0%* without tools and 64.5%* with tools. Mythos Preview scores 56.8% and 64.7% respectively—essentially tied with tools but trailing by 2 points without. The distance to Opus 4.8 is already significant (49.8% without, 57.9% with), but even bigger to the flagship competitor models (GPT 5.5: 41.4% and 52.2%, Gemini 3.1 Pro: 44.4% and 51.4%).

The gap between Mythos 5 and the rest of the field is clearest in the no-tools condition, where it leads Opus 4.8 by over 9 points. These are starred scores, meaning Fable 5 performs somewhat lower due to its safety classifiers.

Computer use, tool use, and spatial reasoning

On OSWorld-Verified, which tests the model's ability to complete tasks on a real computer interface, Mythos 5 scores 85.0%. Mythos Preview edges ahead at 85.4%, making this the only benchmark where Mythos Preview leads. Opus 4.8 comes quite close (83.4%), with the competitors falling behind a bit. GPT 5.5 scores 78.7%, and Gemini 3.1 Pro scores 76.2%.

AutomationBench measures tool use capabilities. Mythos 5 scores 17.4%, compared to 15.5% for Opus 4.8, 12.9% for GPT 5.5, and 9.6% for Gemini 3.1 Pro. The low absolute numbers across the board suggest tool use remains a hard problem for all frontier models.

Spatial reasoning is one area where Mythos 5's lead is the biggest. It scores 38.6% in Blueprint-Bench 2, more than double that of Opus 4.8's 14.5%. GPT 5.5 is closer at 36.2%, and Gemini 3.1 Pro scores 26.5%.

Cybersecurity and biology

Those were the two areas that arguably received the most attention in the release notes, and the results show us why.

ExploitBench measures the fraction of exploits the model can successfully reproduce (Cap%). Mythos 5 scores 78.0%*, which is even a significant improvement from Mythos Preview (69.0%), and a dramatic increase compared to Opus 4.8's 40.0% for Opus 4.8 and 34.0% for GPT 5.5.

The 38-point gap over Opus 4.8 is the largest single-benchmark lead in the comparison table, and it explains why the cyber safeguards exist for Fable 5. Anthropic's external red-teaming found no universal jailbreaks on long-form agentic tasks, though the UK AISI made progress toward one in an initial testing window.

BioMysteryBench tests biological reasoning at two difficulty levels. On the hard subset, Mythos 5 scores 46.1%*, compared to 29.6% for Mythos Preview and 40.0% for Opus 4.8. On the human-solved subset, Mythos 5 scores 83.9%*, Mythos Preview scores 82.6%, and Opus 4.8 scores 80.4%. GPT 5.5 and Gemini 3.1 Pro do not have reported scores on either subset.

As with ExploitBench, Fable 5's scores are closer to Opus 4.8 due to biology-related safety classifiers.

Claude Mythos 5 demonstrates notable strength in two high-stakes professional domains where accuracy and reasoning quality carry real-world consequences: medicine and law.

In HealthBench Professional, Mythos 5 scores 66.0%*, just over Mythos Preview's 64.7%. Opus 4.8 scores 56.9%, and GPT 5.5 scores 51.8%.

On the Legal Agent Benchmark, Mythos 5 scores 13.3%, compared to 10.4% for Opus 4.8, and only 2.1% for GPT 5.5. The absolute scores are low, but the separation between Mythos 5 and GPT 5.5 or Gemini is stark. Legal reasoning remains a challenging frontier for all models.

Claude Mythos 5 Pricing and Availability

Claude Mythos 5 is priced at $10 per million input tokens and $50 per million output tokens. This is less than half the price of Claude Mythos Preview ($25/$125), which makes the upgrade straightforward for existing Glasswing partners. Developers can access the model via the Claude API using the model ID claude-mythos-5.

Access is currently restricted to two groups:

  • All users who had access to Claude Mythos Preview through Project Glasswing can upgrade to Mythos 5 with cyber safeguards lifted
  • A small group of biomedical researchers that can access Mythos 5 with biology and chemistry safeguards lifted, but cyber safeguards still in place.

Anthropic plans to expand both programs over time.

A broader trusted access program is planned for cybersecurity organizations to apply more systematically, in consultation with the US Government. Anthropic has not announced a timeline for general availability. For most developers, Claude Fable 5 is the practical option today, with the same underlying model and access via standard subscription and API plans.

One operational detail worth flagging: Anthropic has introduced a 30-day data retention policy for all Mythos-class model traffic. The data is not used for training and is deleted after 30 days in almost all cases, but it is retained for safety monitoring. If you're building on Mythos 5 with sensitive data, review Anthropic's support documentation on this policy before deploying.

Final Thoughts

Claude Mythos 5 is Anthropic's clearest statement yet that the company is serious about deploying frontier AI in high-stakes professional contexts, and the results back it up.

The SWE-bench Pro gap (80.3% vs 69.2%), the Terminal-Bench 2.1 gap (88.0% vs 82.7%), and the ExploitBench gap (78.0% vs 40.0%) all point to a model that handles the hardest tasks more reliably than anything else available.

The restricted access model is a reasonable approach given the dual-use risks, and the ExploitBench scores make a compelling case that the most capable offensive security tools shouldn't be publicly available. The harder question is whether Anthropic can expand the trusted access program fast enough to be useful to the broader security and biomedical research communities before competitors close the gap.

For organizations that qualify, the upgrade from Mythos Preview is straightforward at less than half the price.


Tom Farnschläder's photo
Author
Tom Farnschläder
LinkedIn

Tom is a data scientist and technical educator. He writes and manages DataCamp's data science tutorials and blog posts. Previously, Tom worked in data science at Deutsche Telekom.

主题

Top AI Courses

Tracks

AI 基础知识

10小时
探索 AI 基础,学习如何在工作中有效利用 AI,并深入了解 ChatGPT 等模型,以驾驭快速变化的 AI 领域。
查看详情Right Arrow
开始课程
查看更多Right Arrow
有关的

blogs

Claude Opus 4.7: Anthropic’s New Best (Available) Model

Explore what's new in Anthropic's latest flagship: stronger agentic coding, sharper vision, and better memory across sessions. Compare the benchmarks against GPT-5.4, Gemini 3.1 Pro, and the locked-away Mythos Preview.
Josef Waples's photo

Josef Waples

9分钟

blogs

Claude Opus 4.6: Features, Benchmarks, Hands-On Tests, and More

Anthropic’s latest model tops leaderboards in agentic coding and complex reasoning. Plus, it has a 1M context window.
Matt Crabtree's photo

Matt Crabtree

10分钟

blogs

Claude Opus 4.5: Benchmarks, Agents, Tools, and More

Discover Claude Opus 4.5 by Anthropic, its best model yet for coding, agents, and computer use. See benchmark results, new tools, and real-world tests.
Josef Waples's photo

Josef Waples

10分钟

blogs

Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks

Explore Anthropic’s Claude Sonnet 4.6, featuring a 1M token context window, near-Opus performance, and advanced agentic capabilities for coding and finance.
Tom Farnschläder's photo

Tom Farnschläder

10分钟

blogs

Anthropic Computer Use: Automate Your Desktop With Claude 3.5

Discover Anthropic’s new computer use feature and let Claude manage your workspace and automate your tasks. Simply type the prompt, and Claude will handle the rest.
Abid Ali Awan's photo

Abid Ali Awan

9分钟

Claude Sonnet 4.5 hailed as the best at coding in the world

blogs

Claude Sonnet 4.5: Tests, Features, Access, Benchmarks, and More

Learn about Claude Sonnet 4.5, the ‘best coding model in the world’. Explore new features, use cases, benchmarks, and testing results, plus a look at the Claude Agents SDK and Claude Imagine.
Matt Crabtree's photo

Matt Crabtree

8分钟

查看更多查看更多