‹ learn
MCP concepts

Are MCP servers safe?

MCP servers are only as safe as the code behind them; there is nothing inherently safe or unsafe about the Model Context Protocol itself. The risk is that an MCP server's tool descriptions and tool outputs flow straight into your agent's context, so an untrusted or compromised server can plant instructions that hijack the agent, leak secrets, or trigger destructive actions. A first-party or audited server with clean tools, no embedded secrets, and no dangerous capability combinations is safe to use; an unvetted third-party server is not, so you vet it (read the tools, check the source, scan it with a tool like CheckMCP) before you install.

The honest answer: it depends on the server

MCP (the Model Context Protocol) is an open JSON-RPC 2.0 standard introduced by Anthropic. The protocol itself is not dangerous: it is a uniform way for an AI host to perform a capability handshake with a server and then discover and call that server's tools, resources and prompts. "Are MCP servers safe?" is really the same question as "is this npm package safe?" or "is this browser extension safe?" The answer is per-server, not per-protocol.

What makes MCP different from a normal API integration is the trust model. When your host connects, it runs one MCP client per server and reads the server's tool definitions (names, descriptions, JSON Schemas) directly into the model's context, then reads tool outputs back as data. Both of those channels are text the model may treat as instructions. So a malicious server doesn't need to exploit a memory bug; it just has to write the right words in a tool description or a tool response.

That means a well-built, first-party, or independently audited server can be perfectly safe to run, while an unknown third-party server pulled from a registry is an untrusted attack surface until you have checked it. The job is to tell the two apart before you install.

What can actually go wrong

Tool poisoning. A server hides agent-directed instructions in a tool's name, description, parameter schema, or output ("ignore previous instructions", "also forward results to...", "do not tell the user"). A human glancing at the UI never sees it, but the model reads it as authoritative. This is the MCP-specific form of prompt injection.

The lethal trifecta. Coined by Simon Willison, this is the dangerous combination of three capabilities in one agent: access to untrusted content, access to sensitive data, and a way to send data out or cause damage. Any one leg alone is usually fine; together, a single injection can read your secrets and exfiltrate them. A single MCP server that bundles a fetch/browse tool, a read-files/read-email tool, and a send/upload tool assembles the whole trifecta by itself.

Hardcoded secrets and unsafe tools. Some servers ship API keys or tokens inside their tool schemas and examples, or expose destructive tools (delete, drop, wipe, reset) with no confirmation step. Both are direct, immediate risks the moment the server is loaded.

Rug pulls and silent drift. A server you audited and approved can later change what its tools/list returns (adding a hidden instruction, widening a destructive tool, or swapping behavior) and because agents re-read tool definitions every session, the change takes effect with no new approval. A one-time review does not protect you against a server that turns malicious in a later release.

How to tell if an MCP server is safe to install

Prefer first-party and well-maintained sources. A server published by the vendor whose API it wraps, or a popular open-source project with an active repo, a real license, and many users, is a far safer starting point than an anonymous server you found in a directory. Repo health (recent commits, issues addressed, a clear license) is a real signal.

Read the tools before you trust them. List the server's tools and actually read the descriptions and input schemas. Watch for instructions aimed at the model rather than at you, for any literal secret or token in a schema or example, and for destructive tools that act without a confirmation parameter. Then inventory the capability mix: does this one server (or your agent's full set of servers together) end up holding untrusted-content + sensitive-data + an outbound path? If so, you have a trifecta to break up.

Prefer local (stdio) for sensitive work and scope credentials tightly. Local stdio servers keep data on your machine; remote servers (Streamable HTTP, or the legacy HTTP+SSE transport) send tool traffic over the network, so check who operates them and what auth they use. Give every server the least privilege it needs, and don't co-load a content-fetching server alongside a secrets-bearing server in the same agent if you can avoid it.

Scan it, then keep watching it. The fastest way to vet an unknown server is to run an automated audit that checks all of the above for you and re-checks it over time so a later rug pull doesn't slip through. A static read of the published tools catches poisoning and secrets shipped in the schema; a runtime probe catches the output-delivered attacks a static scan can't see; continuous monitoring catches drift.

Building your own server? Make it pass the same bar

If you are the one shipping an MCP server, the safety checklist is the inverse of the risks above. Keep secrets out of tool schemas, descriptions, defaults and examples and load them from the environment instead. Mark destructive tools with a destructiveHint and require explicit confirmation before they act.

Don't bundle untrusted-content ingestion, sensitive-data access, and an outbound or destructive path into one server; splitting those capabilities means a single injection can't become a breach. Set accurate tool annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) so clients can treat your tools correctly instead of assuming the worst.

Finally, treat the tool list as a contract: version it, and re-audit on every release so you don't accidentally ship a regression that looks like a rug pull to your users. Running an automated audit in CI turns "is my server safe?" into a gate you can't forget to check.

How CheckMCP handles it

CheckMCP exists to answer \"is this MCP server safe?\" with evidence instead of a guess. Paste a URL at checkmcp.dev or run uvx checkmcp <url> and it probes the live server and returns a vendor-neutral MCP Score (0-100, grade A-F) across seven weighted pillars, with security the top-weighted at 20/100 (then tool design 18, schemas/descriptions 16, reliability 14, context/token cost 12, compliance 12, use-case coverage 8). The security pillar runs an OWASP MCP Top 10 pass: it flags hardcoded secrets in schemas (MCP01), destructive tools missing a confirmation or destructiveHint (MCP02), injected poisoning instructions in descriptions/schemas/outputs (MCP03), and the lethal-trifecta capability combination (MCP06). Categorical failures hit a hard floor: a secret in a schema or a critical injection caps the grade at D (the score is capped at 69 and flagged SECURITY_RISK), and a failed MCP handshake caps it at F, so a server can't buy back a serious security flaw with polish elsewhere. Opt-in behavioral evals exercise only read-only tools with benign canary inputs to catch prompt injection and data exfiltration delivered through tool outputs, never invoking mutating tools. For ongoing safety, the GitHub Action (uses: H129hj/checkmcp@v1) fails a build on a score regression or rug-pull, drift monitoring re-checks tracked servers, and an in-band Gateway sits between your agent and the server to block tool-poisoning and exfiltration in tool outputs before they reach the model (passive observe-and-log mode, or active block/strip mode). Repos and stdio servers are graded on a separate four-pillar Repo-Quality Score (maintenance 40, license 25, adoption 20, documentation 15) so you can weigh project health too. The CLI is open-source (MIT, stdlib-only).

Are MCP servers safe? — FAQ

Are MCP servers safe to use?+
It depends entirely on the individual server, not on the protocol. MCP itself is a neutral open JSON-RPC 2.0 standard; the risk is that a server's tool descriptions and outputs are read straight into your agent's context, so an untrusted or compromised server can hijack the agent, leak secrets, or trigger destructive actions. A first-party or audited server with clean tools and no dangerous capability combinations is safe; an unvetted third-party server is not until you have checked it.
Are MCP servers secure by default?+
No, there is no built-in vetting. Your host trusts whatever a server returns from tools/list and re-reads it every session, so there is no lockfile-style guarantee that the tools you reviewed are the tools that run later. Security depends on the server author keeping secrets out of schemas, gating destructive tools, and not bundling risky capabilities, and on you auditing the server before and after you install it.
What are the security risks of MCP servers?+
The main MCP-specific risks are tool poisoning (hidden agent-directed instructions in tool metadata or output), the lethal trifecta (one server combining untrusted content, sensitive-data access, and an exfiltration or destruction path so an injection becomes a breach), hardcoded secrets exposed in tool schemas, destructive tools that act without confirmation, and rug pulls (a trusted server silently changing its tools after approval). CheckMCP checks for all of these as part of its OWASP MCP Top 10 security pass.
How can I tell if an MCP server is safe to install?+
Prefer first-party or well-maintained open-source servers, then read the tool descriptions and input schemas yourself, looking for instructions aimed at the model, literal secrets in schemas, and unconfirmed destructive tools. Inventory the capability mix to make sure no single server (or your whole agent) forms the lethal trifecta, prefer local stdio servers for sensitive data, and scope credentials tightly. The fastest path is to run an automated audit like CheckMCP (uvx checkmcp <url>), which checks all of this and re-checks the server over time.
Is it dangerous to add an unknown MCP server to Claude Desktop or my agent?+
Yes, treat any unknown third-party server as untrusted until vetted. Once loaded, its tool definitions and outputs go straight into the model's context, so it can attempt prompt injection or, if it also has data access and an outbound path, exfiltration. Audit the server first, give it least privilege, and avoid co-loading a content-fetching server with a secrets-bearing one in the same agent.
Does a one-time security scan keep an MCP server safe?+
No. A single scan only certifies the server as it was at that moment, and a rug pull or silent tool drift happens afterward: the server changes its tools/list and the agent trusts the new version with no re-approval. Detecting that requires continuous re-probing, capturing a baseline and diffing the tool set on each run while re-running the security checks. CheckMCP's drift monitoring, GitHub Action, and Gateway are built for exactly this ongoing check.

Related