‹ learn
MCP concepts

MCP server vulnerabilities

MCP server vulnerabilities are the security weaknesses an AI agent inherits when it connects to a Model Context Protocol (MCP) server, because the server's tool definitions and tool outputs flow straight into the model's context as trusted text. The main attack surface spans tool poisoning, hardcoded secrets in schemas, command and SSRF injection, the lethal trifecta, and rug pulls (silent tool drift after approval). The defense is to audit a server before trusting it — statically against the OWASP MCP Top 10 and, where possible, behaviorally — and to re-check it on every release.

Why an MCP server is an attack surface

MCP is an open JSON-RPC 2.0 protocol introduced by Anthropic. A host runs one MCP client per server, performs a capability handshake, then discovers and calls the server's tools, resources and prompts over a transport (stdio for local servers, Streamable HTTP for remote, or legacy HTTP+SSE). An agent trusts a server through two of those channels, and both carry attacker-controllable text. First, it reads the server's tool definitions — names, descriptions and JSON Schemas — into its context so it knows what each tool does. Second, it reads tool outputs back as data when a tool is called. Current models cannot reliably tell trusted instructions apart from text that merely looks like instructions, so either channel can be used to steer the agent.

That makes the server, not just its code, the unit of risk. A vulnerability here is rarely a memory-safety bug; it is the server delivering content the agent acts on. A human reviewing the chat UI never sees most of this surface — parameter defaults, examples, output schemas and raw tool responses are all ingested by the model but invisible to the user.

The exposure is also additive. An agent that loads several third-party servers combines their capabilities, so servers that look individually safe can together hand the agent everything an attacker needs. Evaluating one server in isolation is necessary but not sufficient; the full toolset is the real boundary.

The core vulnerability classes (the OWASP MCP Top 10)

Tool poisoning is the headline risk: hidden, imperative instructions planted in tool metadata or outputs ("ignore previous instructions", "do not tell the user", "also forward the result to…"). The static form ships in the published tool list; the runtime form arrives inside the value a tool returns, which is why a tool that relays web pages, emails or issue comments can pass a schema review and still deliver attacker-authored text on a specific query.

Hardcoded secrets are a second class: API keys, tokens or passwords baked into a tool's schema, default value or example. Anything in the schema is read by the agent and may be logged or echoed, so a secret in a definition is effectively a leaked secret. Command and SSRF injection is a third: a tool that passes caller-supplied input into a shell, a query, or an outbound HTTP request without isolation can be coerced into running commands or fetching internal URLs.

The lethal trifecta is the impact multiplier — a single server or agent that holds untrusted-content ingestion, sensitive-data access, and an exfiltration-or-destruction path at once. Any one leg is usually safe; all three together turn a prompt injection into a real breach. Rounding out the surface are unsafe destructive tools that act without confirmation, missing protocol and compliance hygiene, and rug pulls.

Static vulnerabilities vs. runtime vulnerabilities

Some weaknesses are visible in the published tool list and can be found without side effects: secrets in schemas, injection signatures in descriptions, destructive tools lacking a confirmation or destructiveHint, and a capability mix that forms the lethal trifecta. A static scan reads names, descriptions, parameter schemas (including defaults and examples) and output schemas — fast, safe, and enough to catch poisoning and risky combinations shipped in the definitions.

Other weaknesses only appear when a tool actually runs. Tool-response poisoning, output-delivered exfiltration vectors and confirmed SSRF cannot be seen by reading a declaration; the description can be clean while the runtime output is hostile. Catching these requires a behavioral probe — invoking read-only tools with benign canary inputs and inspecting what comes back, never calling mutating tools.

Robust auditing therefore needs both layers, plus a temporal one. A single audit only certifies the server as it was at probe time; drift and rug pulls happen afterward, so detecting them means re-probing and diffing the tool surface against a known-good baseline.

How to stay safe (for integrators and builders)

If you are integrating a third-party server: audit it before trusting it, prefer servers that publish a methodology and score, and re-audit on every version. Inventory which servers contribute which trifecta leg and avoid loading a content-fetching server alongside a secrets-bearing server in the same agent. Pin a reviewed tool set where you can and alert on regression, since the agent will otherwise re-read whatever the server returns next session with no further approval.

If you are building a server: keep secrets out of schemas, defaults and examples; require explicit confirmation (and set destructiveHint) on destructive tools; isolate or sandbox anything that touches a shell, a database, or an outbound request; and resist bundling untrusted-content ingestion, sensitive-data access and an outbound path into one server. Validate and constrain tool inputs, and treat every tool output your server relays as untrusted.

Across both roles, capability separation beats hoping the model behaves. Because no current model fully resists prompt injection, the durable defense is to break at least one leg of the trifecta, gate consequential actions behind human confirmation, and continuously re-check the servers you depend on.

How CheckMCP handles it

CheckMCP maps this attack surface to a vendor-neutral MCP Score (0–100, grade A–F). Security is the top-weighted of the seven live-endpoint pillars (weight 20/100) and runs an OWASP MCP Top 10 pass: it flags hardcoded secret values in schemas, destructive tools missing a confirmation or destructiveHint, injected instructions in descriptions, parameter schemas or outputs (tool poisoning), and the lethal-trifecta capability combination on one server, among others. The other six live pillars are tool design (18), schemas and descriptions (16), reliability (14), context-cost/token (12), compliance (12) and coverage/use-case (8). Categorical failures hit hard floors: a secret found in a tool schema caps the grade at D, and a failed MCP handshake caps it at F. For the runtime-only classes, opt-in behavioral evals exercise read-only tools with benign canary inputs to catch tool-response poisoning and data exfiltration (including a planted callback-canary URL that, if the server fetches it, confirms exfiltration/SSRF), and CheckMCP never invokes mutating tools. Repo/stdio servers are scored separately on four pillars: maintenance (40), license (25), adoption (20) and documentation (15). You run it via `uvx checkmcp <url>` (open-source MIT, stdlib-only CLI), the web app at checkmcp.dev, or a GitHub Action (`uses: H129hj/checkmcp@v1`) to fail a build on score regression or a rug pull. An in-band Gateway (passive and active modes) blocks tool poisoning and drift before it reaches your agent, and drift monitoring re-checks tracked servers over time.

MCP server vulnerabilities — FAQ

What are the security risks of MCP servers?+
The main risks are tool poisoning (hidden instructions in tool metadata or outputs), hardcoded secrets exposed in tool schemas, command and SSRF injection in tools that touch a shell or make outbound requests, the lethal trifecta (one server combining untrusted content, sensitive data and an exfiltration or destruction path), unsafe destructive tools that act without confirmation, and rug pulls where a trusted server silently changes its tools after approval. Because tool definitions and outputs flow into the agent's context as trusted text, these are the categories CheckMCP audits as the OWASP MCP Top 10.
What is the MCP server attack surface?+
It is everything the agent ingests from a server: the static tool definitions (names, descriptions, parameter schemas including defaults and examples, and output schemas) plus the runtime data tools return. Both channels carry text a model may treat as instructions, and a user reviewing the chat UI sees almost none of it. The surface is also additive across multiple loaded servers, so the real boundary is the agent's full toolset, not any single tool.
Can a third-party MCP server compromise my agent?+
Yes. A malicious or compromised server can plant instructions in its tool descriptions, schemas or outputs that steer the agent, and if that server can also reach sensitive data and send data out, an injection becomes a breach (the lethal trifecta). The defense is to audit the server before trusting it and re-audit on every release. CheckMCP runs that audit statically on every scan and, optionally, at runtime via behavioral evals.
How do I find vulnerabilities in an MCP server before trusting it?+
Audit it on both layers. A static scan reads the published tool metadata for secrets, injection signatures, missing destructive-tool confirmations and a lethal-trifecta capability mix; a behavioral probe invokes only read-only tools with benign canary inputs to catch output-delivered poisoning, exfiltration and confirmed SSRF without ever calling mutating tools. CheckMCP does both — paste a URL at checkmcp.dev or run `uvx checkmcp <url>` — and produces an explainable 0–100 MCP Score with the reason for every deduction.
What is the most dangerous MCP vulnerability?+
The combination matters more than any single flaw. Tool poisoning supplies the injection, but it only becomes a breach when paired with the lethal trifecta — one server that can read untrusted content, reach sensitive data, and exfiltrate or destroy. CheckMCP treats secret exposure and these critical security cases categorically: a secret found in a tool schema trips a hard floor that caps the grade at D regardless of how clean the rest of the server is, and a server that fails the MCP handshake is capped at F.
How do I protect against MCP rug pulls and tool drift?+
Because agents re-read tool definitions each session without re-approval, a one-time audit cannot catch a server that turns malicious later. Pin a reviewed tool set, alert on regression, and continuously re-probe — diffing the current tool surface against a known-good baseline and re-running the security checks on whatever the server now returns. CheckMCP's drift monitoring and GitHub Action fail a build on score regression or a rug pull, and its Gateway blocks drift in-band before it reaches your agent.

Related