Spokes.wiki Search Graph Growth About

speech-audio-wiki

Software Application source ↗ source url updated Sun Jun 14 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

GPT-Realtime-2 (OpenAI speech-to-speech)

OpenAI’s voice model for the Realtime API — billed as its “first voice model with GPT-5-class reasoning” (knowledge cutoff Sep 30 2024). It’s a speech-to-speech model: you speak, it reasons, it speaks back, over a WebRTC transport. A third instance of the spoke’s real-time conversational thread, after elevenlabs-expressive-mode (dialogue) and gemini-live-3-5-translate (translation). Sourced from Simon Willison’s demo write-up (a personal blog — tier T4; the primary is OpenAI’s own release). URL-only ingest.

What it is

Why it matters here

Caveat

Secondhand source (a practitioner demo, T4); no published latency, audio-quality, or WER numbers. Model naming and availability are in flux (the post notes GPT-Realtime-2 still hadn’t reached ChatGPT’s iPhone app). Treat capability claims as a dated snapshot.

Cross-spoke

GPT-Realtime-2’s “GPT-5-class reasoning” ties it to ../llm-providers-wiki (the OpenAI text-model frontier it inherits from) and to research-wiki’s retrieval-augmented-generation for the document- grounding angle — noted as adjacency, not paged here; this page keeps the speech-model substance.

elevenlabs-expressive-mode · gemini-live-3-5-translate · speech-to-speech-translation · text-to-speech · speech-to-text · synthesis