Перейти к основному содержимому

GPT-5.6 Sol, Terra, and Luna: OpenAI's Next-Generation Model Family

OpenAI's GPT-5.6 introduces the Sol, Terra, and Luna models. We break down the benchmarks, pricing, and how they compare to GPT-5.5, Claude, and Gemini.
26 июн. 2026 г.  · 8 мин читать

GPT-5.6 is a three-model series that today is launching into a limited preview coordinated with the U.S. government. 

It's not yet available for wide use, but there's still plenty of capability news to dig into, especially around coding, biology, and cybersecurity, which we will get into it.

If you want go back in time with OpenAI model news, we have a record of write-ups for every release: GPT-5.5 Instant, GPT-5.3 Instant, and GPT-5.2

And if you want to compare GPT-5.6 to the latest available models, we have articles for that, too: Gemini 3.1 Pro, Claude Opus 4.8, Claude Mythos 5, Claude Fable 5.

What Is GPT-5.6?

GPT-5.6 is OpenAI's newest model generation. It's actually family of three models, and (finally) there's a new naming convention.

In the new system, the number is the generation (5.6), and the name is the capability tier (Sol, Terra, Luna). One advantage I imagine of the new convention is that ecah tier can advance on its own schedule and it wil be less confusing. "Instant" wasn't the best name to mean the latest underlying model.

What is GPT-5.6 Sol?

Sol is the flagship and the strongest, most capable model in the series. It's the tier OpenAI leads with on every benchmark in the release, and it's the only model that unlocks the new max reasoning effort and ultra mode (more on both below).

Sol is also where the cybersecurity and biology gains are most pronounced. Sol is the model to use iff you want the highest ceiling on hard, multi-step problems and don't mind paying the most per token.

What is GPT-5.6 Terra?

Terra is the balanced, everyday-work model. You can think of it as the default. Terra has competitive performance with GPT-5.5 while being about 2x cheaper. This might be something we keep seeing going forward: roughly last-generation flagship quality at a mid-tier price. 

What is GPT-5.6 Luna?

Luna is the fast, affordable tier. It's aimed at high-volume, latency-sensitive, or budget-conscious workloads. As the benchmark table below will show, "cheapest" doesn't necessarily mean "weakest" on every task. That's worth mentioning so you don't overlook it.

In sum: 

  • Sol is the strongest model with the biggest safety stack
  • Terra is similar to GPT‑5.5 but cheaper
  • Luna has the lowest cost.

What's Else Is New with GPT-5.6?

This update also brings new ways to dial up reasoning, plus broad capability gains in a few specific domains.

Two new ways to push the model harder

GPT-5.6 introduces two new control:

  • max reasoning effort: a new reasoning-effort setting that gives Sol the most time to think on a problem.

  • ultra mode: this one is more interesting. Instead of a single agent working a task, ultra uses subagents.

If you look at the benchmarks below, you'll see ultra shows up as its own line ("GPT-5.6 Sol Ultra"), and it posts the top score.

Stronger coding

GPT-5.6 Sol sets a new state of the art on Terminal-Bench 2.1, which came out about six weeks ago. The newest version of Terminal-Bench tests command-line workflows that need iteration and tool coordination. We'll get into the actual numbers in the benchmark section.

Stronger biology

This is a lesser known benchmark test: GeneBench v1. It evaluates long-horizon genomics and quantitative-biology analyses. We see from the release that GPT-5.6 Sol gets stronger results than GPT-5.5 while using fewer tokens. 

Stronger (and more carefully handled!) cybersecurity

OpenAI calls GPT-5.6 Sol its most capable model yet for cybersecurity, and it the company dedicates a lot of time in the release to this idea. Sol shifts performance-efficiency frontier on long-horizon security tasks, including vulnerability research and exploitation. The release tells us about more successes more lesser-known benchmark tests: 

  • On ExploitBench, OpenAI says GPT-5.6 Sol is competitive with the (unreleased) Mythos Preview model while using only about a third of the output tokens.
  • On ExploitGym, which is a benchmark built by UC Berkeley researchers, all three models all show strong cyber improvements as reasoning increases.

GPT-5.6 Benchmark Results

The headline benchmark success story is the result on Terminal-Bench 2.1, which I mentioned earlier. Here's how the models stack up, sorted from highest to lowest score:

Model Terminal-Bench 2.1
GPT-5.6 Sol Ultra 91.9%
GPT-5.6 Sol 88.8%
GPT-5.5 88.0%
GPT-5.6 Luna 84.3%
Claude Mythos 5 84.3%
Claude Fable 5 83.4%
GPT-5.6 Terra 82.5%
Claude Opus 4.8 78.9%
Gemini 3.1 Pro Preview 70.7%

A few things stand out:

  • ultra mode makes a difference. GPT-5.6 Sol Ultra (91.9%) sits clearly above plain Sol (88.8%), which is the cleanest evidence that the subagent approach works, at least here.

  • The tier ordering doesn't track this single benchmark perfectly. Notice that GPT-5.6 Luna actually scores above GPT-5.6 Terra, even though Terra is positioned as the higher tier. And Terra lands below GPT-5.5 (88.0%) on this test, despite OpenAI describing Terra as "competitive with GPT-5.5" overall. I think the learning is this: Tiers are about the intelligence/speed/cost balance across many tasks, averaged out; it's not a guarantee on any one benchmark.

The Safety Stack and the Limited Preview

Here's where GPT-5.6 diverges most from a normal model launch. Rather than testing the model ourselves (more on why we can't below), the most newsworthy part of this release is how it's being shipped.

A layered safeguard stack

OpenAI's argument is that no single safeguard holds up against determined or adaptive misuse, so it stacks several. Across the GPT-5.6 preview, the layers include:

  • Model-level training to refuse prohibited cyber assistance, including attempts to disguise intent or jailbreak the model.
  • Real-time classifiers for cyber and biology misuse that evaluate output as it's generated. For higher-risk cases, generation can be paused while a larger reasoning model reviews the full conversation, and the output can be withheld before it ever reaches you.
  • Account-level review that can look across a user's conversations and risk signals to distinguish persistent malicious behavior from legitimate dual-use security work.
  • Differentiated access, monitoring, enforcement, and continued testing.

OpenAI is upfront that, especially during the preview, this means some requests may be blocked or refused, and others may simply take longer because generation paused for review. It also acknowledges safeguards may occasionally trip on legitimate dual-use work, where defensive and offensive activity can look similar at first. 

Red-teaming at scale

To harden all of this, OpenAI says it threw an unusual amount of compute at safety: over 700,000 A100-equivalent GPU hours of automated red-teaming aimed at finding universal jailbreaks, attacks that generalize across many prompts and contexts rather than working in one narrow setting. That was paired with third-party human expert red-teaming, which continues through the preview, plus a rapid-response process to reproduce and patch newly discovered jailbreaks and fold them into future testing.

Why the rollout is so cautious

OpenAI says it previewed GPT-5.6's plans and capabilities to the U.S. government ahead of launch, and that at the government's request, it's starting with a limited preview for a small group of trusted partners.

This makes this the second time in a month the U.S. government has reached into a frontier model launch because two weeks ago, a U.S. export-control directive forced Anthropic to pull Claude Fable 5 and Mythos 5 offline for every customer worldwide.

My read: OpenAI was watching what happened to Anthropic and choosing to hand over the keys up front rather than get a model yanked after it's already shipped. 

A note on testing

In our usual model write-ups, this is where we'd put the new model through its paces, hands-on, on reasoning, web search, and a high-stakes prompt or two. We can't do that here yet. GPT-5.6 is in a limited preview restricted to a small set of trusted partners, so it isn't generally available to test.

How Can I Access GPT-5.6?

During the preview, GPT-5.6 models are available through the API and Codex to a select group of trusted partners and organizations. OpenAI says it plans to make them more broadly available across ChatGPT, Codex, and the API soon. 

Pricing

GPT-5.6 is priced per 1M tokens across the three tiers:

Model Input (per 1M tokens) Output (per 1M tokens)
GPT-5.6 Sol $5.00 $30.00
GPT-5.6 Terra $2.50 $15.00
GPT-5.6 Luna $1.00 $6.00

A few pricing details worth knowing:

  • More predictable prompt caching, including support for explicit cache breakpoints and a 30-minute minimum cache life.
  • For GPT-5.6 and later models, cache writes are billed at 1.25x the model's uncached input rate, while cache reads keep the 90% cached-input discount.

Speed via Cerebras

OpenAI also says it's launching GPT-5.6 Sol on Cerebras at up to 750 tokens per second in July, with access initially limited to select customers as capacity expands. That's aimed squarely at latency-sensitive work, where waiting on a frontier model is the bottleneck.

Final Thoughts

GPT-5.6 has some good things: a cleaner three-tier family, two new ways to push reasoning (max and ultra), and gains in coding, biology, and cybersecurity.

Still, the model is currently a closed preview, so no one independent can verify any of it yet. The story OpenAI tells, that GPT-5.6 is "so capable we had to coordinate with the government" feels a bit like the most flattering possible marketing. But at the same time, that doesn't make it false, and we should appreciate all caution around cybersecurity. 

For now, the honest summary is: the numbers look strong, the access is narrow, and the real test, literally, comes when the models go broadly available. Subscibe to our newsletter, and we will keep you up to date. 


Josef Waples's photo
Author
Josef Waples

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess! 

FAQs

What is GPT-5.6?

GPT-5.6 is OpenAI's newest model generation. Rather than a single model, it ships as a family of three: Sol (the flagship), Terra (a balanced everyday model), and Luna (a fast, low-cost model).

What's the difference between Sol, Terra, and Luna?

They're capability tiers. Sol is the strongest and most capable model. Terra is the mid-tier, positioned as competitive with GPT-5.5 while being about 2x cheaper. Luna is the budget tier, built for strong capability at the lowest cost. In OpenAI's new naming system, the number (5.6) is the generation and the name (Sol/Terra/Luna) is the durable tier, so each tier can advance on its own schedule.

Can I use GPT-5.6 right now?

Not generally, yet. At launch, GPT-5.6 is in a limited preview available through the API and Codex to a small group of trusted partners and organizations. OpenAI says it plans to make the models more broadly available across ChatGPT, Codex, and the API soon.

Why is the GPT-5.6 rollout so limited?

OpenAI says it previewed the models' plans and capabilities to the U.S. government ahead of launch and, at the government's request, is starting with a limited preview for partners whose participation was shared with the government. OpenAI has said it doesn't think this kind of access process should become the long-term default, and frames it as a short-term step toward broader availability while it works with the Administration on a cyber Executive Order framework.

How much does GPT-5.6 cost?

Per 1M tokens: Sol is $5 input / $30 output, Terra is $2.50 input / $15 output, and Luna is $1 input / $6 output. GPT-5.6 also adds more predictable prompt caching, with cache writes billed at 1.25x the uncached input rate and cache reads keeping the 90% cached-input discount.

What are max and ultra modes?

They're two new ways to push the model harder. max is a new reasoning-effort setting that gives Sol the most time to reason deeply. ultra goes further, using subagents to accelerate complex work beyond what a single agent can do. In the release, "GPT-5.6 Sol Ultra" posts the top Terminal-Bench score.

How does GPT-5.6 perform on benchmarks?

On Terminal-Bench 2.1 (command-line/coding workflows), GPT-5.6 Sol Ultra leads at 91.9%, with plain Sol at 88.8%, ahead of GPT-5.5 (88.0%) and competitors like Claude Mythos 5 (84.3%), Claude Fable 5 (83.4%), Claude Opus 4.8 (78.9%), and Gemini 3.1 Pro Preview (70.7%). OpenAI also reports biology gains on GeneBench v1 and cybersecurity gains on ExploitBench and ExploitGym, with a fuller evaluation suite promised at broad release.

How is GPT-5.6 different from GPT-5.5?

GPT-5.6 is a new generation with a three-tier family and new max/ultra reasoning controls, plus gains in coding, biology, and cybersecurity. It also ships with a heavier safety stack and a more cautious, government-coordinated rollout. By comparison, GPT-5.5 was a broadly available default-model update focused on conversation quality, fewer hallucinations, and personalization.

When will GPT-5.6 come to ChatGPT?

OpenAI hasn't given a firm date, saying only that it plans to bring the models to ChatGPT, Codex, and the API "soon," and to expand availability "in the coming weeks." A Cerebras-hosted version of Sol (up to 750 tokens/second) is also slated for July, initially for select customers.

Темы

Learn with DataCamp

Course

Генеративный ИИ для бизнеса

1 ч
59.1K
Узнайте, какую роль Generative Artificial Intelligence играет сегодня и будет играть в будущем в бизнес-среде.
ПодробнееRight Arrow
Начать курс
Смотрите большеRight Arrow
Связанный

blog

GPT-5.5 vs Gemini 3.1 Pro: Which Frontier Model Should You Use?

Compare OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro on coding, reasoning, agentic benchmarks, pricing, and context limits to help choose the right model.
Derrick Mwiti's photo

Derrick Mwiti

8 мин

blog

GPT-5.3 Instant: Features, Tests, and Availability

OpenAI's latest LLM prioritizes natural conversation, smarter web search, and fewer hallucinations.
Josef Waples's photo

Josef Waples

7 мин

gpt-5

blog

GPT-5: New Features, Tests, Benchmarks, and More

Learn about GPT-5's new features, performance benchmarks, and how it consolidates previous OpenAI models into a unified user experience.
Alex Olteanu's photo

Alex Olteanu

8 мин

gpt-4.1 saying goodbye to gpt-4.5

blog

GPT 4.1: Features, Access, GPT-4o Comparison, and More

Learn about OpenAI's new GPT-4.1 family of models: GPT-4.1, GPT-4.1 Mini, and GPT-4.1 Nano.
Alex Olteanu's photo

Alex Olteanu

8 мин

blog

Claude Opus 4.7 vs GPT-5.5: Which Frontier Model Is Best?

A head-to-head comparison of OpenAI's GPT-5.5 and Anthropic's Claude Opus 4.7 across coding, reasoning, vision, tool use, and pricing.
Tom Farnschläder's photo

Tom Farnschläder

11 мин

blog

GPT-5.1: Two Models, Automatic Routing, Adaptive Reasoning, and More

OpenAI's latest update emphasizes user experience with intelligent model routing and deeper control over tone and style.
Josef Waples's photo

Josef Waples

10 мин

Смотрите большеСмотрите больше