Spokes.wiki Search Graph Growth About

speech-audio-wiki

Software Application updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Whisper

OpenAI’s open-weight speech-to-text model family — the ecosystem default for ASR. MIT licensed; a transformer encoder-decoder. Its origin and the why behind its dominance — large-scale weak supervision (680k hours, zero-shot, no fine-tuning) — are grounded in the primary paper whisper-paper (Radford et al., 2022).

Variants

Why it still matters despite not topping WER

By 2026 Whisper is no longer the most accuratecanary-qwen (5.63%), IBM Granite (5.85%), Qwen3-ASR all beat it on English WER. Its dominance is ecosystem: permissive MIT license, 99+ language coverage, and an enormous volume of community tooling and integrations. It’s the safe multilingual default; the SALM models win on raw English accuracy speech-to-text.

Place in the wiki

The open-source STT anchor — the recognition-side counterpart to kokoro‘s role on the synthesis side (the permissive, widely-deployed baseline). Available self-hosted or via OpenAI’s API (stt-apis-comparison).

whisper-paper · speech-to-text · speech-audio-ai · canary-qwen · open-source-stt-models · stt-apis-comparison