MCP concepts

The lethal trifecta

The lethal trifecta is when a single agent (or MCP server) simultaneously has access to untrusted content, access to sensitive data, and a way to send data out (exfiltration) or cause damage (destruction) — the combination that lets a prompt injection turn into a real breach. CheckMCP detects it statically as OWASP MCP06: a CRITICAL finding that fires when one server's tools cover all three capability classes at once.

What the lethal trifecta is

Coined by Simon Willison, the "lethal trifecta" names the three capabilities that, when an AI agent holds all three at once, make data theft or damage achievable through a single prompt injection: (1) access to untrusted content the agent will read, (2) access to private or sensitive data, and (3) the ability to communicate externally or take consequential actions.

Any one leg alone is usually safe. Untrusted web content is harmless if the agent can't reach secrets; secrets are safe if there is no outbound path. The danger is combinatorial: once all three coexist in the same agent context, an attacker who can plant instructions in the untrusted content (a web page, an email, a tool result) can instruct the agent to read the sensitive data and ship it out — and current models cannot reliably distinguish trusted instructions from injected ones.

Why MCP servers concentrate the risk

MCP makes the trifecta easy to assemble by accident. A single server often bundles a tool that ingests untrusted external content (fetch, scrape, browse, web-search, read-page), a tool that touches sensitive data (read email, query a database, read files, access tokens), and a tool that sends or mutates (post, upload, webhook, email, or delete). An agent that loads that one server now holds all three legs.

For developers integrating third-party MCP servers, the surface is additive across servers: even servers that are individually safe can combine into a trifecta inside one agent. The unit of risk is the agent's full toolset, not any single tool. CheckMCP's static check evaluates the trifecta per server, so a trifecta assembled across multiple servers in one agent must still be reasoned about at the composition level.

Mitigation

The trifecta is mitigated by breaking at least one leg: isolate untrusted-content ingestion from sensitive data, remove or gate the outbound/destructive path, require human confirmation before consequential actions, and avoid loading a content-fetching server alongside a secrets-bearing server in the same agent.

Because no current model fully resists prompt injection, defense relies on capability separation rather than on the agent being careful. Inventorying which servers contribute which leg before deployment is the practical first step.

How CheckMCP handles it

CheckMCP detects the lethal trifecta statically (T1) in its Security pillar (weight 20/100) as OWASP MCP06. In `security.py`, `audit()` sorts every tool into four capability buckets using name-matching regexes: `untrusted_content` (UNTRUSTED — e.g. fetch/scrape/browse/crawl/http/_url/web-search/read-page/download/wiki/rss/feed), `exfil` (EXFIL — e.g. send/post/publish/upload/email/notify/webhook/export/sync/push/transfer/message), and `destructive` (DESTRUCT — e.g. delete/remove/drop/destroy/purge/reset/truncate/revoke/kill/terminate/overwrite/wipe) are matched on the tool name only, while `sensitive_data` (SENSITIVE — e.g. secret/credential/token/api-key/password/vault/email/inbox/read-file/database/sql/payment/private-key/env_) is matched against the name plus the description and schema text. The trifecta fires when `untrusted_content AND sensitive_data AND (exfil OR destructive)` are all present on the SAME server — emitted as a CRITICAL MCP06 finding: "lethal trifecta: untrusted-content ingestion + sensitive-data access + exfil/destruction -> an injection can exfiltrate." If three or more of the four classes are present but they don't form that exact pattern, it instead emits a HIGH MCP06 "toxic surface: 3 risky capability classes combined" finding. The trifecta sets `hard_floor`, which in `score.py` caps the overall MCP Score at 69 and flags the report `SECURITY_RISK` (grade D maximum), regardless of how the other pillars score. This is a static name/schema heuristic; the separate opt-in behavioral evals in `evals.py` (CheckMCP's T4 canary sandbox) can confirm the outbound leg for real via a callback canary: a unique URL is planted in a read-only tool's input, and if the server fetches it, CheckMCP records an `exfiltration_confirmed` finding (HIGH, confidence 1.0) — confirmed SSRF/exfiltration.

Audit an MCP server ›

The lethal trifecta — FAQ

What are the three parts of the lethal trifecta?+

Access to untrusted content (web pages, emails, tool results that may carry injected instructions), access to sensitive or private data (secrets, files, databases, mailboxes), and an exfiltration or destruction path (the ability to send data out or perform destructive operations). All three present in one agent context is the trifecta.

Is one MCP tool enough to create the lethal trifecta?+

No — it is combinatorial across a toolset. CheckMCP flags it (OWASP MCP06) when a single server's tools cover untrusted-content ingestion, sensitive-data access, and an exfil-or-destruction path together. The same combination can also form across multiple servers loaded into one agent; CheckMCP's check is per server, so cross-server trifectas have to be reasoned about at the composition level.

How does CheckMCP penalize a server with the lethal trifecta?+

It raises a CRITICAL MCP06 finding in the Security pillar and sets a hard floor: the overall MCP Score is capped at 69 (grade D maximum) and the report is flagged SECURITY_RISK, regardless of how the other pillars score.

Can CheckMCP confirm a server actually exfiltrates, not just that it could?+

The static MCP06 check only flags the capability combination from tool names and schemas. The separate opt-in behavioral evals add a callback canary: a unique URL is planted in a read-only tool's input, and if the server fetches it, CheckMCP records exfiltration_confirmed (HIGH severity, confidence 1.0) — confirmed SSRF/exfiltration.

Tool poisoning Tool-output prompt injection MCP security