MCP server vulnerabilities
MCP server vulnerabilities are the security weaknesses an AI agent inherits when it connects to a Model Context Protocol (MCP) server, because the server's tool definitions and tool outputs flow straight into the model's context as trusted text. The main attack surface spans tool poisoning, hardcoded secrets in schemas, command and SSRF injection, the lethal trifecta, and rug pulls (silent tool drift after approval). The defense is to audit a server before trusting it — statically against the OWASP MCP Top 10 and, where possible, behaviorally — and to re-check it on every release.
Why an MCP server is an attack surface
MCP is an open JSON-RPC 2.0 protocol introduced by Anthropic. A host runs one MCP client per server, performs a capability handshake, then discovers and calls the server's tools, resources and prompts over a transport (stdio for local servers, Streamable HTTP for remote, or legacy HTTP+SSE). An agent trusts a server through two of those channels, and both carry attacker-controllable text. First, it reads the server's tool definitions — names, descriptions and JSON Schemas — into its context so it knows what each tool does. Second, it reads tool outputs back as data when a tool is called. Current models cannot reliably tell trusted instructions apart from text that merely looks like instructions, so either channel can be used to steer the agent.
That makes the server, not just its code, the unit of risk. A vulnerability here is rarely a memory-safety bug; it is the server delivering content the agent acts on. A human reviewing the chat UI never sees most of this surface — parameter defaults, examples, output schemas and raw tool responses are all ingested by the model but invisible to the user.
The exposure is also additive. An agent that loads several third-party servers combines their capabilities, so servers that look individually safe can together hand the agent everything an attacker needs. Evaluating one server in isolation is necessary but not sufficient; the full toolset is the real boundary.
The core vulnerability classes (the OWASP MCP Top 10)
Tool poisoning is the headline risk: hidden, imperative instructions planted in tool metadata or outputs ("ignore previous instructions", "do not tell the user", "also forward the result to…"). The static form ships in the published tool list; the runtime form arrives inside the value a tool returns, which is why a tool that relays web pages, emails or issue comments can pass a schema review and still deliver attacker-authored text on a specific query.
Hardcoded secrets are a second class: API keys, tokens or passwords baked into a tool's schema, default value or example. Anything in the schema is read by the agent and may be logged or echoed, so a secret in a definition is effectively a leaked secret. Command and SSRF injection is a third: a tool that passes caller-supplied input into a shell, a query, or an outbound HTTP request without isolation can be coerced into running commands or fetching internal URLs.
The lethal trifecta is the impact multiplier — a single server or agent that holds untrusted-content ingestion, sensitive-data access, and an exfiltration-or-destruction path at once. Any one leg is usually safe; all three together turn a prompt injection into a real breach. Rounding out the surface are unsafe destructive tools that act without confirmation, missing protocol and compliance hygiene, and rug pulls.
Static vulnerabilities vs. runtime vulnerabilities
Some weaknesses are visible in the published tool list and can be found without side effects: secrets in schemas, injection signatures in descriptions, destructive tools lacking a confirmation or destructiveHint, and a capability mix that forms the lethal trifecta. A static scan reads names, descriptions, parameter schemas (including defaults and examples) and output schemas — fast, safe, and enough to catch poisoning and risky combinations shipped in the definitions.
Other weaknesses only appear when a tool actually runs. Tool-response poisoning, output-delivered exfiltration vectors and confirmed SSRF cannot be seen by reading a declaration; the description can be clean while the runtime output is hostile. Catching these requires a behavioral probe — invoking read-only tools with benign canary inputs and inspecting what comes back, never calling mutating tools.
Robust auditing therefore needs both layers, plus a temporal one. A single audit only certifies the server as it was at probe time; drift and rug pulls happen afterward, so detecting them means re-probing and diffing the tool surface against a known-good baseline.
How to stay safe (for integrators and builders)
If you are integrating a third-party server: audit it before trusting it, prefer servers that publish a methodology and score, and re-audit on every version. Inventory which servers contribute which trifecta leg and avoid loading a content-fetching server alongside a secrets-bearing server in the same agent. Pin a reviewed tool set where you can and alert on regression, since the agent will otherwise re-read whatever the server returns next session with no further approval.
If you are building a server: keep secrets out of schemas, defaults and examples; require explicit confirmation (and set destructiveHint) on destructive tools; isolate or sandbox anything that touches a shell, a database, or an outbound request; and resist bundling untrusted-content ingestion, sensitive-data access and an outbound path into one server. Validate and constrain tool inputs, and treat every tool output your server relays as untrusted.
Across both roles, capability separation beats hoping the model behaves. Because no current model fully resists prompt injection, the durable defense is to break at least one leg of the trifecta, gate consequential actions behind human confirmation, and continuously re-check the servers you depend on.
How CheckMCP handles it
CheckMCP maps this attack surface to a vendor-neutral MCP Score (0–100, grade A–F). Security is the top-weighted of the seven live-endpoint pillars (weight 20/100) and runs an OWASP MCP Top 10 pass: it flags hardcoded secret values in schemas, destructive tools missing a confirmation or destructiveHint, injected instructions in descriptions, parameter schemas or outputs (tool poisoning), and the lethal-trifecta capability combination on one server, among others. The other six live pillars are tool design (18), schemas and descriptions (16), reliability (14), context-cost/token (12), compliance (12) and coverage/use-case (8). Categorical failures hit hard floors: a secret found in a tool schema caps the grade at D, and a failed MCP handshake caps it at F. For the runtime-only classes, opt-in behavioral evals exercise read-only tools with benign canary inputs to catch tool-response poisoning and data exfiltration (including a planted callback-canary URL that, if the server fetches it, confirms exfiltration/SSRF), and CheckMCP never invokes mutating tools. Repo/stdio servers are scored separately on four pillars: maintenance (40), license (25), adoption (20) and documentation (15). You run it via `uvx checkmcp <url>` (open-source MIT, stdlib-only CLI), the web app at checkmcp.dev, or a GitHub Action (`uses: H129hj/checkmcp@v1`) to fail a build on score regression or a rug pull. An in-band Gateway (passive and active modes) blocks tool poisoning and drift before it reaches your agent, and drift monitoring re-checks tracked servers over time.