OpenAI's GPT-Realtime-2: A Voice Model with GPT-5-Class Reasoning

OpenAI's three new audio models — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — allow for live translation and streaming transcription in the Realtime API.

7 Mei 2026 · 9 mnt baca

Just two days after the release of GPT-5.5 Instant, OpenAI has more big news, with a new release that focuses on three things at once:

a voice model that can reason while it talks: that's GPT‑Realtime‑2
live translation across 70+ languages: GPT‑Realtime‑Translate
and finally, streaming transcription that keeps up with real conversations: GPT‑Realtime‑Whisper

In this article, we will catch you up to speed on the news with these three models.

What Is GPT-Realtime-2?

GPT-Realtime-2 is the new realtime voice model in OpenAI's API, and the first voice model OpenAI describes as having "GPT-5-class reasoning."

It's built for live voice interactions, ie: someone is talking, not typing into it.

It's designed to keep the conversation moving while it reasons through a request, calls tools, and handles corrections. In other words, it is mean to responds in a way that fits the moment.

Here are some important characteristics compared to the previous model, GPT-Realtime-1.5:

the context window jumps from 32K to 128K
developers can now dial in reasoning effort
small touches like preamble phrases make voice agents feel less robotic

What Is GPT-Realtime-Translate?

GPT-Realtime-Translate is OpenAI's new live speech translation model, supporting 70+ input languages and 13 output languages.

It's built for the voice-to-voice case: each person speaks in their preferred language, and the model translates in real time. It is supposed to hold meaning together when speakers switch context, use regional pronunciation, or drop in domain-specific terms.

What Is GPT-Realtime-Whisper?

GPT-Realtime-Whisper is OpenAI's new streaming speech-to-text model, built for low-latency transcription as the speaker talks.

The original Whisper, which was designed for completed chunks of audio. With the newer streaming version, we have a model that is more useful for live broadcast captions, and voice agents that need to understand the user continuously rather than turn-by-turn.

So, if you're lost, here's the structure:

GPT-Realtime-2 = a full conversational voice agent. Listens, reasons, calls tools, talks back. You use this when you want voice in and voice out.
GPT-Realtime-Translate = a translation pipe. Speech in language A → speech in language B. It's not having a conversation with anyone; it's converting one stream into another.
GPT-Realtime-Whisper = a transcription pipe. Speech in → text out. No reasoning, no voice response. You'd use it for live captions, etc.

Key Features of GPT-Realtime-2

The following features apply to GPT-Realtime-2 specifically.

Preambles

Developers can have the model say short filler phrases like "let me check that" or "one moment while I look into it" before its main response.

This is a big feature because people tend to be pretty impatient or intolerant with awkward silence. Human-style filler is one of those things that makes an agent feel competent.

Parallel tool calls with audio narration

GPT-Realtime-2 can call multiple tools at once and narrate what it's doing while it does. So instead of dead air during a multi-step task, the user gets a running commentary. This is mostly a UX win.

Stronger recovery behavior

When something goes wrong, like if either a tool fails or if a request is ambiguous, say, the model can say something like "I'm having trouble with that right now" instead of going silent or making something up.

Context window: 32K → 128K

The context window jumps from 32K to 128K tokens.

Adjustable reasoning effort

Developers can now select from minimal, low, medium, high, and xhigh reasoning levels.

Low is the default, which keeps latency down for simple back-and-forth, with more deliberate options when the request is harder.

Better domain understanding and tone control

The model now better retains specialized terminology - healthcare terms, etc. It can also adjust its delivery: calmer when resolving an issue, empathetic when a user is frustrated, upbeat when confirming a successful action.

GPT-Realtime-2 Benchmark Results

Let's take a look at the benchmarks. OpenAI is comparing against GPT-Realtime-1.5, which makes for a clean year-over-year picture:

Big Bench Audio (audio intelligence): 81.4% → 96.6% — a 15.2 point lift.
Audio MultiChallenge (instruction following in spoken dialogue): 34.7% → 48.5% — a 13.8 point lift.

The Big Bench Audio number is interesting. 96.6% tells us the benchmark is approaching saturation. Audio MultiChallenge, on the other hand, is still under 50%, so this second benchmark result is a useful reality check. "Better than last year's voice model" and "ready for unsupervised production" are different bars.

Worth flagging: these numbers were run at "high" and "xhigh" reasoning settings. The default in production will be "low," for latency reasons, so users may have a different experience than their expecation based on the headline benchmark result.

How Can I Access GPT-Realtime-2?

All three audio models are available now in the Realtime API:

GPT-Realtime-2: $32 per 1M audio input tokens ($0.40 for cached input), $64 per 1M audio output tokens.
GPT-Realtime-Translate: $0.034 per minute.
GPT-Realtime-Whisper: $0.017 per minute.

The two-minute-priced models are much easier to reason about for budgeting. Per-token audio pricing is hard to convert into "what will this cost per call?" without actually instrumenting it. So a developer should expect to spend a little time modeling expected costs before shipping or promising something.

Luckily, you can test GPT-Realtime-2 in the Playground, and OpenAI is pointing developers toward Codex (we have written a lot about Codex) with a starter prompt for adding it to existing apps.

GPT-Realtime-2 and Safety

On the safety side, OpenAI says active classifiers can halt sessions that violate its harmful content guidelines, and developers can layer their own guardrails via the Agents SDK, which we've also written about.

Keep in mind: Voice introduces very specific ways things can go wrong. These are worth talking about:

Accidental activations: The system starts listening or responding when nobody meant to talk to it.
Ambient audio capture: Once a microphone is on, it picks up everything in the room, not just the user. Background conversations, kids, coworkers, a TV, a confidential meeting next door, etc., etc.
Voice-cloning concerns: Voice is biometric. Synthetic speech that sounds like a real person can be used for impersonation, fraud, or bypassing voice-authentication systems. This is both an output and input concern.

Final Thoughts

OpenAI is bundling the things that make voice agents feel competent — filler phrases, narrated tool calls, graceful recovery, a big context window, a real reasoning dial — into a model that can also actually reason. What this amount to for the user: fewer awkward silences and conversations that are less likely to fall apart. That's a big step forward.

Author

Josef Waples

Topik

Artificial Intelligence

Learn with DataCamp

Program

Dasar-Dasar Bisnis Kecerdasan Buatan

12 Hr

Percepat perjalanan AI Anda, kuasai ChatGPT, dan kembangkan strategi Kecerdasan Buatan yang komprehensif.

Lihat Detail

Mulai Kursus

Program

Kecerdasan Buatan untuk Rekayasa Perangkat Lunak

7 Hr

Tulis kode dan bangun aplikasi perangkat lunak lebih cepat dari sebelumnya dengan alat pengembangan AI terbaru, termasuk GitHub Copilot, Windsurf, dan Replit.

Lihat Detail

Mulai Kursus

Kursus

Generative AI untuk Bisnis

1 Hr

54.3K

Pelajari peran Generative Artificial Intelligence saat ini dan di masa depan dalam lingkungan bisnis.

Lihat Detail

Mulai Kursus

Lihat Lebih Banyak

Terkait

blogs

GPT-5.1: Two Models, Automatic Routing, Adaptive Reasoning, and More

OpenAI's latest update emphasizes user experience with intelligent model routing and deeper control over tone and style.

Josef Waples

10 mnt

blogs

GPT-5.3 Instant: Features, Tests, and Availability

OpenAI's latest LLM prioritizes natural conversation, smarter web search, and fewer hallucinations.

Josef Waples

7 mnt

blogs

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

Discover how GPT-5.2 improves knowledge work with major upgrades in long-context reasoning, tool calling, coding, vision, and end-to-end workflow execution.

Josef Waples

10 mnt

blogs

GPT 5.5 Instant: An Upgrade to OpenAI’s Default Model

OpenAI's latest default prioritizes factual reliability, concise answers, and memory you can audit.

Josef Waples

8 mnt

blogs

ChatGPT Images 2.0: A Guide to OpenAI's Next-Generation Image Model

Discover how ChatGPT Images 2.0 pushes image generation into a new era with stronger real-world reasoning, multilingual text rendering, stylistic realism, and a visual thought-partner workflow.

Josef Waples

14 mnt

Tutorials

OpenAI GPT‑5 API: Hands-On With New Features

Explore the latest OpenAI GPT-5 API features with code examples, including reasoning effort, verbosity control, chain-of-thought handoff, freeform input, output constraints, allowed tools, preambles, prompt optimization, and more.

Abid Ali Awan

Lihat Lebih Banyak Lihat Lebih Banyak

What Is GPT-Realtime-2?

What Is GPT-Realtime-Translate?

What Is GPT-Realtime-Whisper?

Key Features of GPT-Realtime-2

Preambles

Parallel tool calls with audio narration

Stronger recovery behavior

Context window: 32K → 128K

Adjustable reasoning effort

Better domain understanding and tone control

GPT-Realtime-2 Benchmark Results

How Can I Access GPT-Realtime-2?

GPT-Realtime-2 and Safety

Final Thoughts

GPT-5.1: Two Models, Automatic Routing, Adaptive Reasoning, and More

GPT-5.3 Instant: Features, Tests, and Availability

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

GPT 5.5 Instant: An Upgrade to OpenAI’s Default Model

ChatGPT Images 2.0: A Guide to OpenAI's Next-Generation Image Model

OpenAI GPT‑5 API: Hands-On With New Features

.css-1531qan{-webkit-text-decoration:none;text-decoration:none;color:inherit;}Dasar-Dasar Bisnis Kecerdasan Buatan

Kecerdasan Buatan untuk Rekayasa Perangkat Lunak

Generative AI untuk Bisnis

GPT-5.1: Two Models, Automatic Routing, Adaptive Reasoning, and More

GPT-5.3 Instant: Features, Tests, and Availability

GPT 5.2: Benchmarks, Model Breakdown, and Real-World Performance

GPT 5.5 Instant: An Upgrade to OpenAI’s Default Model

ChatGPT Images 2.0: A Guide to OpenAI's Next-Generation Image Model

OpenAI GPT‑5 API: Hands-On With New Features

Dasar-Dasar Bisnis Kecerdasan Buatan