Spokes.wiki Search Graph Growth About

speech-audio-wiki

Software Application updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Canary-Qwen 2.5B

NVIDIA’s open-weight speech-to-text model that tops the Open ASR Leaderboard — WER 5.63% (English). 2.5B params, CC-BY-4.0, English-only, RTFx 418× open-source-stt-models.

The architecture that defines the 2026 trend

A SALM (Speech-Augmented Language Model): a FastConformer speech encoder feeding an unmodified Qwen3-1.7B LLM decoder. So ASR is reframed as language modeling over speech features — the recognition-side instance of the “speech on an LLM backbone” convergence the wiki tracks across text-to-speech (Llama-based TTS, neural-audio-codec) and music (suno). Siblings on the same idea: IBM Granite-Speech, Alibaba Qwen3-ASR.

Significance

speech-to-text · speech-audio-ai · whisper · open-source-stt-models · neural-audio-codec