GPT-5.6 Sol, Terra, and Luna: OpenAI's Next-Generation Model Family

OpenAI's GPT-5.6 introduces the Sol, Terra, and Luna models. We break down the benchmarks, pricing, and how they compare to GPT-5.5, Claude, and Gemini.

26 Jun 2026 · 8 mnt baca

GPT-5.6 is a three-model series that today is launching into a limited preview coordinated with the U.S. government.

It's not yet available for wide use, but there's still plenty of capability news to dig into, especially around coding, biology, and cybersecurity, which we will get into it.

If you want go back in time with OpenAI model news, we have a record of write-ups for every release: GPT-5.5 Instant, GPT-5.3 Instant, and GPT-5.2.

And if you want to compare GPT-5.6 to the latest available models, we have articles for that, too: Gemini 3.1 Pro, Claude Opus 4.8, Claude Mythos 5, Claude Fable 5.

What Is GPT-5.6?

GPT-5.6 is OpenAI's newest model generation. It's actually family of three models, and (finally) there's a new naming convention.

In the new system, the number is the generation (5.6), and the name is the capability tier (Sol, Terra, Luna). One advantage I imagine of the new convention is that ecah tier can advance on its own schedule and it wil be less confusing. "Instant" wasn't the best name to mean the latest underlying model.

What is GPT-5.6 Sol?

Sol is the flagship and the strongest, most capable model in the series. It's the tier OpenAI leads with on every benchmark in the release, and it's the only model that unlocks the new max reasoning effort and ultra mode (more on both below).

Sol is also where the cybersecurity and biology gains are most pronounced. Sol is the model to use iff you want the highest ceiling on hard, multi-step problems and don't mind paying the most per token.

What is GPT-5.6 Terra?

Terra is the balanced, everyday-work model. You can think of it as the default. Terra has competitive performance with GPT-5.5 while being about 2x cheaper. This might be something we keep seeing going forward: roughly last-generation flagship quality at a mid-tier price.

What is GPT-5.6 Luna?

Luna is the fast, affordable tier. It's aimed at high-volume, latency-sensitive, or budget-conscious workloads. As the benchmark table below will show, "cheapest" doesn't necessarily mean "weakest" on every task. That's worth mentioning so you don't overlook it.

In sum:

Sol is the strongest model with the biggest safety stack
Terra is similar to GPT‑5.5 but cheaper
Luna has the lowest cost.

What's Else Is New with GPT-5.6?

This update also brings new ways to dial up reasoning, plus broad capability gains in a few specific domains.

Two new ways to push the model harder

GPT-5.6 introduces two new control:

max reasoning effort: a new reasoning-effort setting that gives Sol the most time to think on a problem.
ultra mode: this one is more interesting. Instead of a single agent working a task, ultra uses subagents.

If you look at the benchmarks below, you'll see ultra shows up as its own line ("GPT-5.6 Sol Ultra"), and it posts the top score.

Stronger coding

GPT-5.6 Sol sets a new state of the art on Terminal-Bench 2.1, which came out about six weeks ago. The newest version of Terminal-Bench tests command-line workflows that need iteration and tool coordination. We'll get into the actual numbers in the benchmark section.

Stronger biology

This is a lesser known benchmark test: GeneBench v1. It evaluates long-horizon genomics and quantitative-biology analyses. We see from the release that GPT-5.6 Sol gets stronger results than GPT-5.5 while using fewer tokens.

Stronger (and more carefully handled!) cybersecurity

OpenAI calls GPT-5.6 Sol its most capable model yet for cybersecurity, and it the company dedicates a lot of time in the release to this idea. Sol shifts performance-efficiency frontier on long-horizon security tasks, including vulnerability research and exploitation. The release tells us about more successes more lesser-known benchmark tests:

On ExploitBench, OpenAI says GPT-5.6 Sol is competitive with the (unreleased) Mythos Preview model while using only about a third of the output tokens.
On ExploitGym, which is a benchmark built by UC Berkeley researchers, all three models all show strong cyber improvements as reasoning increases.

GPT-5.6 Benchmark Results

The headline benchmark success story is the result on Terminal-Bench 2.1, which I mentioned earlier. Here's how the models stack up, sorted from highest to lowest score:

Model	Terminal-Bench 2.1
GPT-5.6 Sol Ultra	91.9%
GPT-5.6 Sol	88.8%
GPT-5.5	88.0%
GPT-5.6 Luna	84.3%
Claude Mythos 5	84.3%
Claude Fable 5	83.4%
GPT-5.6 Terra	82.5%
Claude Opus 4.8	78.9%
Gemini 3.1 Pro Preview	70.7%

A few things stand out:

ultra mode makes a difference. GPT-5.6 Sol Ultra (91.9%) sits clearly above plain Sol (88.8%), which is the cleanest evidence that the subagent approach works, at least here.
The tier ordering doesn't track this single benchmark perfectly. Notice that GPT-5.6 Luna actually scores above GPT-5.6 Terra, even though Terra is positioned as the higher tier. And Terra lands below GPT-5.5 (88.0%) on this test, despite OpenAI describing Terra as "competitive with GPT-5.5" overall. I think the learning is this: Tiers are about the intelligence/speed/cost balance across many tasks, averaged out; it's not a guarantee on any one benchmark.

The Safety Stack and the Limited Preview

Here's where GPT-5.6 diverges most from a normal model launch. Rather than testing the model ourselves (more on why we can't below), the most newsworthy part of this release is how it's being shipped.

A layered safeguard stack

OpenAI's argument is that no single safeguard holds up against determined or adaptive misuse, so it stacks several. Across the GPT-5.6 preview, the layers include:

Model-level training to refuse prohibited cyber assistance, including attempts to disguise intent or jailbreak the model.
Real-time classifiers for cyber and biology misuse that evaluate output as it's generated. For higher-risk cases, generation can be paused while a larger reasoning model reviews the full conversation, and the output can be withheld before it ever reaches you.
Account-level review that can look across a user's conversations and risk signals to distinguish persistent malicious behavior from legitimate dual-use security work.
Differentiated access, monitoring, enforcement, and continued testing.

OpenAI is upfront that, especially during the preview, this means some requests may be blocked or refused, and others may simply take longer because generation paused for review. It also acknowledges safeguards may occasionally trip on legitimate dual-use work, where defensive and offensive activity can look similar at first.

Red-teaming at scale

To harden all of this, OpenAI says it threw an unusual amount of compute at safety: over 700,000 A100-equivalent GPU hours of automated red-teaming aimed at finding universal jailbreaks, attacks that generalize across many prompts and contexts rather than working in one narrow setting. That was paired with third-party human expert red-teaming, which continues through the preview, plus a rapid-response process to reproduce and patch newly discovered jailbreaks and fold them into future testing.

Why the rollout is so cautious

OpenAI says it previewed GPT-5.6's plans and capabilities to the U.S. government ahead of launch, and that at the government's request, it's starting with a limited preview for a small group of trusted partners.

This makes this the second time in a month the U.S. government has reached into a frontier model launch because two weeks ago, a U.S. export-control directive forced Anthropic to pull Claude Fable 5 and Mythos 5 offline for every customer worldwide.

My read: OpenAI was watching what happened to Anthropic and choosing to hand over the keys up front rather than get a model yanked after it's already shipped.

A note on testing

In our usual model write-ups, this is where we'd put the new model through its paces, hands-on, on reasoning, web search, and a high-stakes prompt or two. We can't do that here yet. GPT-5.6 is in a limited preview restricted to a small set of trusted partners, so it isn't generally available to test.

How Can I Access GPT-5.6?

During the preview, GPT-5.6 models are available through the API and Codex to a select group of trusted partners and organizations. OpenAI says it plans to make them more broadly available across ChatGPT, Codex, and the API soon.

Pricing

GPT-5.6 is priced per 1M tokens across the three tiers:

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-5.6 Sol	$5.00	$30.00
GPT-5.6 Terra	$2.50	$15.00
GPT-5.6 Luna	$1.00	$6.00

A few pricing details worth knowing:

More predictable prompt caching, including support for explicit cache breakpoints and a 30-minute minimum cache life.
For GPT-5.6 and later models, cache writes are billed at 1.25x the model's uncached input rate, while cache reads keep the 90% cached-input discount.

Speed via Cerebras

OpenAI also says it's launching GPT-5.6 Sol on Cerebras at up to 750 tokens per second in July, with access initially limited to select customers as capacity expands. That's aimed squarely at latency-sensitive work, where waiting on a frontier model is the bottleneck.

Final Thoughts

GPT-5.6 has some good things: a cleaner three-tier family, two new ways to push reasoning (max and ultra), and gains in coding, biology, and cybersecurity.

Still, the model is currently a closed preview, so no one independent can verify any of it yet. The story OpenAI tells, that GPT-5.6 is "so capable we had to coordinate with the government" feels a bit like the most flattering possible marketing. But at the same time, that doesn't make it false, and we should appreciate all caution around cybersecurity.

For now, the honest summary is: the numbers look strong, the access is narrow, and the real test, literally, comes when the models go broadly available. Subscibe to our newsletter, and we will keep you up to date.

Author

Josef Waples

What is GPT-5.6?

What's the difference between Sol, Terra, and Luna?

Can I use GPT-5.6 right now?

Why is the GPT-5.6 rollout so limited?

How much does GPT-5.6 cost?

What are max and ultra modes?

How does GPT-5.6 perform on benchmarks?

How is GPT-5.6 different from GPT-5.5?

When will GPT-5.6 come to ChatGPT?

Topik

Artificial Intelligence

ChatGPT

Learn with DataCamp

Kursus

Generative AI untuk Bisnis

1 Hr

59.1K

Pelajari peran Generative Artificial Intelligence saat ini dan di masa depan dalam lingkungan bisnis.

Lihat Detail

Mulai Kursus

Kursus

Pembersihan Data dengan Generative AI

1 Hr

13.1K

Gunakan kecerdasan buatan generatif untuk menangani pembersihan data, memperbaiki duplikat, nilai kosong, dan format agar dataset menjadi konsisten dan akurat.

Lihat Detail

Mulai Kursus

Kursus

Bekerja dengan OpenAI Responses API

3 Hr

554

Bangun aplikasi AI yang cerdas, interaktif, dan andal dengan lebih mudah dari sebelumnya menggunakan OpenAI Responses API dan GPT-5.

Lihat Detail

Mulai Kursus

Lihat Lebih Banyak

Terkait

blogs

GPT-5.5 vs Gemini 3.1 Pro: Which Frontier Model Should You Use?

Compare OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro on coding, reasoning, agentic benchmarks, pricing, and context limits to help choose the right model.