OpenAI expands API with GPT-Realtime-2, translation and speech-to-text models


SOURCE: BESTMEDIAINFO.COM
MAY 08, 2026

BestMediaInfo Bureau

08 May 2026 16:43 IST

Follow Us

Open ai logo

Listen to this article

0.75x1x1.5x

00:00/ 00:00

New Delhi: OpenAI has introduced a new set of realtime voice models for developers, expanding its API offerings with tools focused on live translation, speech transcription and voice-based AI interactions.

The company announced three new models, GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper, designed to support voice applications capable of handling conversations, reasoning through requests and processing speech in real time.

According to OpenAI, the update reflects growing use of voice as an interface for software applications, particularly in customer service, travel, productivity and multilingual communication.

GPT-Realtime-2 is the company’s first voice model built with GPT-5-class reasoning capabilities. OpenAI said the model can manage more complex spoken requests, handle interruptions and corrections during conversations, and interact with external tools while continuing live dialogue.

The model also introduces features including adjustable reasoning levels, larger context windows and more control over tone and delivery during conversations. OpenAI stated that the context window has been expanded from 32K to 128K tokens to support longer interactions and more complex workflows.

The company said GPT-Realtime-2 showed improved performance over its earlier realtime voice model in internal audio intelligence and instruction-following benchmarks.

OpenAI also introduced GPT-Realtime-Translate, a live translation model capable of translating speech from more than 70 input languages into 13 output languages during conversations. The model is intended for multilingual customer support, education, travel and media applications.

Several companies including Deutsche Telekom, Vimeo and BolnaAI are testing or integrating the translation model into their products and services.

Prateek Sachan, Co-founder and CTO of BolnaAI, said, “Building voice AI for India means handling diverse regional phonetics. In our evals across Hindi, Tamil, and Telugu, GPT-Realtime-Translate delivered 12.5% lower Word Error Rates than any other model we tested, along with lower fallback rates, higher task completion, and latency that sustained natural conversation. It sets a new standard for multilingual voice AI.”

The third model, GPT-Realtime-Whisper, is a streaming speech-to-text tool that transcribes spoken conversations live as users speak. OpenAI said the model could be used for live captions, meeting notes, customer support workflows and enterprise communication systems.

The company also highlighted safety measures within the Realtime API, including classifiers that can halt sessions if conversations violate harmful content policies. Developers can add additional safeguards through OpenAI’s Agents SDK.

The new models are available through OpenAI’s Realtime API. GPT-Realtime-2 is priced at $32 per one million audio input tokens and $64 per one million audio output tokens, while GPT-Realtime-Translate and GPT-Realtime-Whisper are priced on a per-minute basis.

OpenAI said developers can test the models through its Playground platform and integrate them into existing applications using its developer tools.