Kursus
GPT-5.6 is a three-model series that today is launching into a limited preview coordinated with the U.S. government.
It's not yet available for wide use, but there's still plenty of capability news to dig into, especially around coding, biology, and cybersecurity, which we will get into it.
If you want go back in time with OpenAI model news, we have a record of write-ups for every release: GPT-5.5 Instant, GPT-5.3 Instant, and GPT-5.2.
And if you want to compare GPT-5.6 to the latest available models, we have articles for that, too: Gemini 3.1 Pro, Claude Opus 4.8, Claude Mythos 5, Claude Fable 5.
What Is GPT-5.6?
GPT-5.6 is OpenAI's newest model generation. It's actually family of three models, and (finally) there's a new naming convention.
In the new system, the number is the generation (5.6), and the name is the capability tier (Sol, Terra, Luna). One advantage I imagine of the new convention is that ecah tier can advance on its own schedule and it wil be less confusing. "Instant" wasn't the best name to mean the latest underlying model.
What is GPT-5.6 Sol?
Sol is the flagship and the strongest, most capable model in the series. It's the tier OpenAI leads with on every benchmark in the release, and it's the only model that unlocks the new max reasoning effort and ultra mode (more on both below).
Sol is also where the cybersecurity and biology gains are most pronounced. Sol is the model to use iff you want the highest ceiling on hard, multi-step problems and don't mind paying the most per token.
What is GPT-5.6 Terra?
Terra is the balanced, everyday-work model. You can think of it as the default. Terra has competitive performance with GPT-5.5 while being about 2x cheaper. This might be something we keep seeing going forward: roughly last-generation flagship quality at a mid-tier price.
What is GPT-5.6 Luna?
Luna is the fast, affordable tier. It's aimed at high-volume, latency-sensitive, or budget-conscious workloads. As the benchmark table below will show, "cheapest" doesn't necessarily mean "weakest" on every task. That's worth mentioning so you don't overlook it.
In sum:
- Sol is the strongest model with the biggest safety stack
- Terra is similar to GPT‑5.5 but cheaper
- Luna has the lowest cost.
What's Else Is New with GPT-5.6?
This update also brings new ways to dial up reasoning, plus broad capability gains in a few specific domains.
Two new ways to push the model harder
GPT-5.6 introduces two new control:
-
maxreasoning effort: a new reasoning-effort setting that gives Sol the most time to think on a problem. -
ultramode: this one is more interesting. Instead of a single agent working a task,ultrauses subagents.
If you look at the benchmarks below, you'll see ultra shows up as its own line ("GPT-5.6 Sol Ultra"), and it posts the top score.
Stronger coding
GPT-5.6 Sol sets a new state of the art on Terminal-Bench 2.1, which came out about six weeks ago. The newest version of Terminal-Bench tests command-line workflows that need iteration and tool coordination. We'll get into the actual numbers in the benchmark section.
Stronger biology
This is a lesser known benchmark test: GeneBench v1. It evaluates long-horizon genomics and quantitative-biology analyses. We see from the release that GPT-5.6 Sol gets stronger results than GPT-5.5 while using fewer tokens.
Stronger (and more carefully handled!) cybersecurity
OpenAI calls GPT-5.6 Sol its most capable model yet for cybersecurity, and it the company dedicates a lot of time in the release to this idea. Sol shifts performance-efficiency frontier on long-horizon security tasks, including vulnerability research and exploitation. The release tells us about more successes more lesser-known benchmark tests:
- On ExploitBench, OpenAI says GPT-5.6 Sol is competitive with the (unreleased) Mythos Preview model while using only about a third of the output tokens.
- On ExploitGym, which is a benchmark built by UC Berkeley researchers, all three models all show strong cyber improvements as reasoning increases.
GPT-5.6 Benchmark Results
The headline benchmark success story is the result on Terminal-Bench 2.1, which I mentioned earlier. Here's how the models stack up, sorted from highest to lowest score:
| Model | Terminal-Bench 2.1 |
|---|---|
| GPT-5.6 Sol Ultra | 91.9% |
| GPT-5.6 Sol | 88.8% |
| GPT-5.5 | 88.0% |
| GPT-5.6 Luna | 84.3% |
| Claude Mythos 5 | 84.3% |
| Claude Fable 5 | 83.4% |
| GPT-5.6 Terra | 82.5% |
| Claude Opus 4.8 | 78.9% |
| Gemini 3.1 Pro Preview | 70.7% |
A few things stand out:
-
ultramode makes a difference. GPT-5.6 Sol Ultra (91.9%) sits clearly above plain Sol (88.8%), which is the cleanest evidence that the subagent approach works, at least here. -
The tier ordering doesn't track this single benchmark perfectly. Notice that GPT-5.6 Luna actually scores above GPT-5.6 Terra, even though Terra is positioned as the higher tier. And Terra lands below GPT-5.5 (88.0%) on this test, despite OpenAI describing Terra as "competitive with GPT-5.5" overall. I think the learning is this: Tiers are about the intelligence/speed/cost balance across many tasks, averaged out; it's not a guarantee on any one benchmark.
The Safety Stack and the Limited Preview
Here's where GPT-5.6 diverges most from a normal model launch. Rather than testing the model ourselves (more on why we can't below), the most newsworthy part of this release is how it's being shipped.
A layered safeguard stack
OpenAI's argument is that no single safeguard holds up against determined or adaptive misuse, so it stacks several. Across the GPT-5.6 preview, the layers include:
- Model-level training to refuse prohibited cyber assistance, including attempts to disguise intent or jailbreak the model.
- Real-time classifiers for cyber and biology misuse that evaluate output as it's generated. For higher-risk cases, generation can be paused while a larger reasoning model reviews the full conversation, and the output can be withheld before it ever reaches you.
- Account-level review that can look across a user's conversations and risk signals to distinguish persistent malicious behavior from legitimate dual-use security work.
- Differentiated access, monitoring, enforcement, and continued testing.
OpenAI is upfront that, especially during the preview, this means some requests may be blocked or refused, and others may simply take longer because generation paused for review. It also acknowledges safeguards may occasionally trip on legitimate dual-use work, where defensive and offensive activity can look similar at first.
Red-teaming at scale
To harden all of this, OpenAI says it threw an unusual amount of compute at safety: over 700,000 A100-equivalent GPU hours of automated red-teaming aimed at finding universal jailbreaks, attacks that generalize across many prompts and contexts rather than working in one narrow setting. That was paired with third-party human expert red-teaming, which continues through the preview, plus a rapid-response process to reproduce and patch newly discovered jailbreaks and fold them into future testing.
Why the rollout is so cautious
OpenAI says it previewed GPT-5.6's plans and capabilities to the U.S. government ahead of launch, and that at the government's request, it's starting with a limited preview for a small group of trusted partners.
This makes this the second time in a month the U.S. government has reached into a frontier model launch because two weeks ago, a U.S. export-control directive forced Anthropic to pull Claude Fable 5 and Mythos 5 offline for every customer worldwide.
My read: OpenAI was watching what happened to Anthropic and choosing to hand over the keys up front rather than get a model yanked after it's already shipped.
A note on testing
In our usual model write-ups, this is where we'd put the new model through its paces, hands-on, on reasoning, web search, and a high-stakes prompt or two. We can't do that here yet. GPT-5.6 is in a limited preview restricted to a small set of trusted partners, so it isn't generally available to test.
How Can I Access GPT-5.6?
During the preview, GPT-5.6 models are available through the API and Codex to a select group of trusted partners and organizations. OpenAI says it plans to make them more broadly available across ChatGPT, Codex, and the API soon.
Pricing
GPT-5.6 is priced per 1M tokens across the three tiers:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-5.6 Sol | $5.00 | $30.00 |
| GPT-5.6 Terra | $2.50 | $15.00 |
| GPT-5.6 Luna | $1.00 | $6.00 |
A few pricing details worth knowing:
- More predictable prompt caching, including support for explicit cache breakpoints and a 30-minute minimum cache life.
- For GPT-5.6 and later models, cache writes are billed at 1.25x the model's uncached input rate, while cache reads keep the 90% cached-input discount.
Speed via Cerebras
OpenAI also says it's launching GPT-5.6 Sol on Cerebras at up to 750 tokens per second in July, with access initially limited to select customers as capacity expands. That's aimed squarely at latency-sensitive work, where waiting on a frontier model is the bottleneck.
Final Thoughts
GPT-5.6 has some good things: a cleaner three-tier family, two new ways to push reasoning (max and ultra), and gains in coding, biology, and cybersecurity.
Still, the model is currently a closed preview, so no one independent can verify any of it yet. The story OpenAI tells, that GPT-5.6 is "so capable we had to coordinate with the government" feels a bit like the most flattering possible marketing. But at the same time, that doesn't make it false, and we should appreciate all caution around cybersecurity.
For now, the honest summary is: the numbers look strong, the access is narrow, and the real test, literally, comes when the models go broadly available. Subscibe to our newsletter, and we will keep you up to date.

I'm a data science writer and editor with contributions to research articles in scientific journals. I'm especially interested in linear algebra, statistics, R, and the like. I also play a fair amount of chess!
FAQs
What is GPT-5.6?
GPT-5.6 is OpenAI's newest model generation. Rather than a single model, it ships as a family of three: Sol (the flagship), Terra (a balanced everyday model), and Luna (a fast, low-cost model).
What's the difference between Sol, Terra, and Luna?
They're capability tiers. Sol is the strongest and most capable model. Terra is the mid-tier, positioned as competitive with GPT-5.5 while being about 2x cheaper. Luna is the budget tier, built for strong capability at the lowest cost. In OpenAI's new naming system, the number (5.6) is the generation and the name (Sol/Terra/Luna) is the durable tier, so each tier can advance on its own schedule.
Can I use GPT-5.6 right now?
Not generally, yet. At launch, GPT-5.6 is in a limited preview available through the API and Codex to a small group of trusted partners and organizations. OpenAI says it plans to make the models more broadly available across ChatGPT, Codex, and the API soon.
Why is the GPT-5.6 rollout so limited?
OpenAI says it previewed the models' plans and capabilities to the U.S. government ahead of launch and, at the government's request, is starting with a limited preview for partners whose participation was shared with the government. OpenAI has said it doesn't think this kind of access process should become the long-term default, and frames it as a short-term step toward broader availability while it works with the Administration on a cyber Executive Order framework.
How much does GPT-5.6 cost?
Per 1M tokens: Sol is $5 input / $30 output, Terra is $2.50 input / $15 output, and Luna is $1 input / $6 output. GPT-5.6 also adds more predictable prompt caching, with cache writes billed at 1.25x the uncached input rate and cache reads keeping the 90% cached-input discount.
What are max and ultra modes?
They're two new ways to push the model harder. max is a new reasoning-effort setting that gives Sol the most time to reason deeply. ultra goes further, using subagents to accelerate complex work beyond what a single agent can do. In the release, "GPT-5.6 Sol Ultra" posts the top Terminal-Bench score.
How does GPT-5.6 perform on benchmarks?
On Terminal-Bench 2.1 (command-line/coding workflows), GPT-5.6 Sol Ultra leads at 91.9%, with plain Sol at 88.8%, ahead of GPT-5.5 (88.0%) and competitors like Claude Mythos 5 (84.3%), Claude Fable 5 (83.4%), Claude Opus 4.8 (78.9%), and Gemini 3.1 Pro Preview (70.7%). OpenAI also reports biology gains on GeneBench v1 and cybersecurity gains on ExploitBench and ExploitGym, with a fuller evaluation suite promised at broad release.
How is GPT-5.6 different from GPT-5.5?
GPT-5.6 is a new generation with a three-tier family and new max/ultra reasoning controls, plus gains in coding, biology, and cybersecurity. It also ships with a heavier safety stack and a more cautious, government-coordinated rollout. By comparison, GPT-5.5 was a broadly available default-model update focused on conversation quality, fewer hallucinations, and personalization.
When will GPT-5.6 come to ChatGPT?
OpenAI hasn't given a firm date, saying only that it plans to bring the models to ChatGPT, Codex, and the API "soon," and to expand availability "in the coming weeks." A Cerebras-hosted version of Sol (up to 750 tokens/second) is also slated for July, initially for select customers.





