Spokes.wiki Search Graph Growth About

speech-audio-wiki

Defined Term domain updated Fri Jun 05 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Audio & music generation

Generating music and audio from text (or reference audio). The generation branch of speech-audio-ai, distinct from speech synthesis (text-to-speech) in producing songs, instrumentals, and sound design rather than spoken words.

The field (2026)

What makes this branch different

  1. Copyright is the dominant axis, not just quality — see ai-music-copyright. Whether you can use the output legally varies more than how good it sounds (Suno litigation vs Udio settlement vs Stable Audio’s licensed data).
  2. Vocals vs instrumental split — the song generators (suno, udio, ElevenLabs) do vocals; the open/instrumental tools (stable-audio, AIVA) don’t.
  3. A growing but contested market — ~$0.57B (2024) → ~$1.98B (2026), yet AI tracks show 25–40% lower save / 15–25% higher skip rates than human recordings ai-music-generators-2026 — demo tool, not finished-product (yet).

Shared with the rest of the wiki

Same open-vs-closed structure (stable-audio is the open wedge, as fish-audio-s2-pro is for TTS) and Elo-style ranking (tts-benchmarks) — but copyright, not license-terms alone, is the sharper constraint here.

speech-audio-ai · ai-music-copyright · suno · udio · stable-audio · ai-music-generators-2026