The lethal trifecta
The lethal trifecta is when a single agent (or MCP server) simultaneously has access to untrusted content, access to sensitive data, and a way to send data out (exfiltration) or cause damage (destruction) — the combination that lets a prompt injection turn into a real breach. CheckMCP detects it statically as OWASP MCP06: a CRITICAL finding that fires when one server's tools cover all three capability classes at once.
What the lethal trifecta is
Coined by Simon Willison, the "lethal trifecta" names the three capabilities that, when an AI agent holds all three at once, make data theft or damage achievable through a single prompt injection: (1) access to untrusted content the agent will read, (2) access to private or sensitive data, and (3) the ability to communicate externally or take consequential actions.
Any one leg alone is usually safe. Untrusted web content is harmless if the agent can't reach secrets; secrets are safe if there is no outbound path. The danger is combinatorial: once all three coexist in the same agent context, an attacker who can plant instructions in the untrusted content (a web page, an email, a tool result) can instruct the agent to read the sensitive data and ship it out — and current models cannot reliably distinguish trusted instructions from injected ones.
Why MCP servers concentrate the risk
MCP makes the trifecta easy to assemble by accident. A single server often bundles a tool that ingests untrusted external content (fetch, scrape, browse, web-search, read-page), a tool that touches sensitive data (read email, query a database, read files, access tokens), and a tool that sends or mutates (post, upload, webhook, email, or delete). An agent that loads that one server now holds all three legs.
For developers integrating third-party MCP servers, the surface is additive across servers: even servers that are individually safe can combine into a trifecta inside one agent. The unit of risk is the agent's full toolset, not any single tool. CheckMCP's static check evaluates the trifecta per server, so a trifecta assembled across multiple servers in one agent must still be reasoned about at the composition level.
Mitigation
The trifecta is mitigated by breaking at least one leg: isolate untrusted-content ingestion from sensitive data, remove or gate the outbound/destructive path, require human confirmation before consequential actions, and avoid loading a content-fetching server alongside a secrets-bearing server in the same agent.
Because no current model fully resists prompt injection, defense relies on capability separation rather than on the agent being careful. Inventorying which servers contribute which leg before deployment is the practical first step.
How CheckMCP handles it
CheckMCP detects the lethal trifecta statically (T1) in its Security pillar (weight 20/100) as OWASP MCP06. In `security.py`, `audit()` sorts every tool into four capability buckets using name-matching regexes: `untrusted_content` (UNTRUSTED — e.g. fetch/scrape/browse/crawl/http/_url/web-search/read-page/download/wiki/rss/feed), `exfil` (EXFIL — e.g. send/post/publish/upload/email/notify/webhook/export/sync/push/transfer/message), and `destructive` (DESTRUCT — e.g. delete/remove/drop/destroy/purge/reset/truncate/revoke/kill/terminate/overwrite/wipe) are matched on the tool name only, while `sensitive_data` (SENSITIVE — e.g. secret/credential/token/api-key/password/vault/email/inbox/read-file/database/sql/payment/private-key/env_) is matched against the name plus the description and schema text. The trifecta fires when `untrusted_content AND sensitive_data AND (exfil OR destructive)` are all present on the SAME server — emitted as a CRITICAL MCP06 finding: "lethal trifecta: untrusted-content ingestion + sensitive-data access + exfil/destruction -> an injection can exfiltrate." If three or more of the four classes are present but they don't form that exact pattern, it instead emits a HIGH MCP06 "toxic surface: 3 risky capability classes combined" finding. The trifecta sets `hard_floor`, which in `score.py` caps the overall MCP Score at 69 and flags the report `SECURITY_RISK` (grade D maximum), regardless of how the other pillars score. This is a static name/schema heuristic; the separate opt-in behavioral evals in `evals.py` (CheckMCP's T4 canary sandbox) can confirm the outbound leg for real via a callback canary: a unique URL is planted in a read-only tool's input, and if the server fetches it, CheckMCP records an `exfiltration_confirmed` finding (HIGH, confidence 1.0) — confirmed SSRF/exfiltration.