Spokes.wiki Search Graph Growth About

speech-audio-wiki

Defined Term domain updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Speech-to-text (STT / ASR)

Automatic speech recognition — transcribing speech audio into text. The recognition branch of speech-audio-ai (the mirror of text-to-speech‘s synthesis).

How it’s measured

The market shape

Mirrors the rest of speech-audio-ai: commercial APIs hold a thin accuracy edge (ElevenLabs Scribe ~3.3% EN, Deepgram 5.26% batch — stt-apis-comparison) over the open-source field (whisper, canary-qwen, Parakeet, Granite, Qwen3-ASR — open-source-stt-models), with the usual build-vs-buy crossover at high volume.

The defining 2026 trend — STT meets the LLM

Top accuracy now comes from SALM-style models that bolt an LLM decoder onto a speech encoder (canary-qwen = FastConformer + Qwen3; Granite-Speech; Qwen3-ASR) — recognition reframed as language modeling, the bridge to llm-providers-wiki (gemini, Qwen/Llama). whisper no longer leads WER but wins on ecosystem (MIT, languages, tooling).

speech-audio-ai · text-to-speech · whisper · canary-qwen · open-source-stt-models · stt-apis-comparison · tts-benchmarks