‹ learn
MCP concepts

Tool poisoning

MCP tool poisoning is an attack where a malicious or compromised MCP server embeds hidden instructions inside tool metadata (names, descriptions, input/output schemas) or tool-call outputs, hijacking the agent that reads them even when the user never sees the text. You detect it by statically scanning all tool metadata and by behaviorally probing live tool responses for agent-directed injection patterns.

What tool poisoning is

MCP clients feed a server's tool definitions — names, descriptions, and JSON-Schema parameters — directly into the agent's context so the model knows what each tool does. Tool poisoning abuses this trust: the attacker plants imperative instructions in that metadata ("ignore previous instructions", "<system>…", "you must always cc [email protected]", "do not tell the user") that the model reads as authoritative, even though a human reviewing the UI never sees it.

The payload can live anywhere the agent ingests text: the tool description, a parameter's description/default/example, the output schema, or — at runtime — inside the value a tool returns. That last variant, tool-response poisoning, is harder to catch because the malicious text only appears when the tool is actually called, often conditionally.

Why it is dangerous for agentic products

A poisoned tool turns a third-party MCP server into a delivery channel for prompt injection of your controlling agent. Once the model treats the injected text as instructions, it can be steered to call other tools, leak data, or silently forward results — the user stays unaware because the directive said not to disclose.

Poisoning is most dangerous when it co-occurs with capability exposure. If the same server can ingest untrusted content, reach sensitive data, and exfiltrate or destroy, an injection becomes an exploit — the lethal-trifecta condition. This is why poisoning is treated as a categorical security risk, not a quality nuance.

How to detect it

Static detection: scan every tool's name, description, parameter schemas (including defaults and examples), and output schema for known injection signatures — "ignore/disregard previous", system/INST tags, "don't tell the user", "secretly/exfiltrate", and covert-forwarding phrasing — and flag each offending tool. This catches poisoning that ships in the published tool list before any call is made.

Behavioral detection: exercise the server's read-only tools with benign canary inputs and inspect the responses for agent-directed instructions, multilingual injection phrasing, exfiltration vectors, and credential-shaped strings. Pair this with a callback canary — a unique URL planted in inputs — so that an outbound fetch of that URL confirms exfiltration rather than merely suggesting it.

How CheckMCP handles it

CheckMCP detects tool poisoning in two complementary passes. Statically, the Security pillar (the top-weighted of the seven, weight 20) runs an OWASP MCP Top 10 audit in security.py: its INJECT regex scans each tool's description, parameter schemas (property name, description, default, examples) and output schema, and raises a CRITICAL MCP03 finding — "injected instruction (poisoning) in description/schema/output" — for every offending tool. Any MCP03 finding trips a hard floor (as does an MCP01 hardcoded-secret finding or a detected lethal trifecta): score.py caps the final MCP Score at 69 and flags the report SECURITY_RISK, so a poisoned server lands at grade D at best, no matter how clean the rest is. Behaviorally (the opt-in canary sandbox in evals.py, labelled tier T4), CheckMCP calls only read-only-safe tools with a benign CANARY input and analyzes each response: an INJECTION match yields an active_prompt_injection finding flagged as tool-response poisoning (confidence up to 0.95), while a planted callback-canary URL that the server fetches produces a confidence-1.0 exfiltration_confirmed finding. CheckMCP never invokes mutating tools.

Tool poisoning — FAQ

What is the difference between tool poisoning and prompt injection?+
Prompt injection is the general technique of smuggling instructions into text an LLM reads. Tool poisoning is the MCP-specific delivery channel: the malicious instructions live in tool metadata (descriptions, schemas) or tool outputs, so they reach the agent through the MCP tool layer rather than through user-visible content.
Can tool poisoning hide in places a human reviewer won't see?+
Yes. The agent ingests the full tool definition — including parameter defaults, examples, and output schemas — and the raw values tools return, none of which a user normally inspects. CheckMCP's static audit scans all of those metadata surfaces (description, parameter schema fields, output schema), and its opt-in behavioral pass inspects live tool responses for the runtime variant.
Does CheckMCP execute tools to find poisoning?+
For static detection, no — it scans the published tool metadata without calling anything. For runtime tool-response poisoning, its opt-in behavioral evals call only read-only-safe tools with benign canary inputs and never invoke mutating tools, so probing a server for poisoning does not trigger side effects.
How does tool poisoning relate to the lethal trifecta?+
Poisoning supplies the injection; the lethal trifecta supplies the impact. CheckMCP raises a CRITICAL MCP06 finding when one server combines untrusted-content ingestion, sensitive-data access, and exfiltration or destruction — the conditions under which a poisoning injection can actually exfiltrate data — and that combination also trips the security hard floor.

Related