Defined Term concept source ↗ source url updated Tue Jun 09 2026 00:00:00 GMT+0000 (Coordinated Universal Time)

Audio deepfake

An audio deepfake is AI-synthesized speech that “convincingly mimics specific individuals … phrases or sentences they have never spoken.” This is the safety/consent axis the synthesis flagged as a coverage gap — the dark side of the voice-cloning and text-to-speech capability the wiki otherwise celebrates. Source: Wikipedia.

How it’s made

Two routes: synthetic TTS (text-analysis → acoustic model → vocoder, à la wavenet) and imitation / voice conversion (alter a real signal’s style/prosody, often via GANs). I.e. the same text-to-speech/voice-cloning tech this wiki tracks, pointed at impersonation.

Harms — and why “rights outweigh quality” bites

Concrete harms: a 2019 CEO-voice scam (€220k); a McAfee survey where “one person in ten … targeted by an AI voice cloning scam, 77% … lost money”; the Feb 2024 fake-Biden robocalls; and consent violations (Jeff Geerling’s and Gayanne Potter’s voices cloned without permission). This is the lived form of the spoke’s thesis that rights/consent can outweigh raw quality.

Responses — provenance & detection

Detection classifies “Spoof” vs “Bonafide” (ASVspoof, DARPA SemaFor) — “limited non-English research” remains a gap. Provenance is the other lever: watermarking like SynthID (the same scheme gemini-live-3-5-translate stamps on output). Policy is moving: the FCC banned AI voices in robocalls (Feb 2024); China mandates deepfake labeling. The counterweight to unconstrained voice-cloning.

voice-cloning · text-to-speech · wavenet · gemini-live-3-5-translate · speech-audio-ai

Audio deepfake

How it’s made

Harms — and why “rights outweigh quality” bites

Responses — provenance & detection

Related

Linked from