MCP security best practices
MCP security best practices are the defensive habits that keep an AI agent safe when it connects to Model Context Protocol servers: apply least privilege to every tool, keep secrets out of tool schemas and outputs, require explicit confirmation for destructive actions, separate untrusted-content tools from sensitive-data and outbound tools (break the lethal trifecta), validate and constrain inputs, and re-audit on every release because tool definitions can silently change. Because a server's tool descriptions and outputs flow straight into the model's context, an untrusted server is an attack surface, so the goal is to limit what any one tool, or any combination of tools, can do.
Principle of least privilege for MCP tools
Least privilege is the single most important MCP practice: each tool should expose the smallest capability that does its job, and the agent should load only the tools it actually needs. A tool named read_invoice should be able to read one invoice, not query the whole database, not write, not reach the network. Over-broad tools turn a single prompt injection into a large blast radius, so scope is your primary containment boundary.
Prefer many narrow, read-only tools over a few god-tools. Mark read-only tools with readOnlyHint and destructive tools with destructiveHint so clients can treat them correctly. Scope credentials per tool rather than handing one all-powerful token to the whole server, and bind each tool to the minimum data, table, path, or API scope it requires.
On the host and client side, least privilege means not auto-loading every server you have configured into every session. Remember the model is the protocol's three roles in motion: a host runs one MCP client per server, performs a capability handshake, then discovers and calls that server's tools, resources, and prompts. The unit of risk is the agent's full active toolset, so enabling fewer servers per task shrinks both the attack surface and the context cost.
Keep secrets out of schemas, descriptions, and outputs
Anything in a tool's name, description, default value, example, or output schema is read into the model's context and may be logged, echoed, or surfaced to the user. A hardcoded API key, token, or private key in any of those places is effectively published. Never put a real secret value in a schema example or default; use placeholders, and inject real credentials at runtime from environment variables or a secret manager.
The same applies to tool outputs at call time. A tool that returns raw secrets, full credential blobs, or unmasked PII hands that data straight to the model, and to anyone who can read the transcript. Mask or omit sensitive fields in responses, and treat the boundary between your backend and the agent as an untrusted egress point.
This is a categorical failure, not a style nit: a leaked credential in a tool schema is one of the highest-severity findings an audit can surface, and it caps an otherwise-clean server's grade.
Gate destructive actions and break the lethal trifecta
Destructive or consequential tools, such as delete, drop, send, transfer, deploy, or pay, should never fire silently on model output alone. Require an explicit confirmation parameter, a dry-run mode, or human-in-the-loop approval, and advertise destructiveHint so clients know to ask. The model can be wrong or steered; a confirmation gate is what stops a hijacked agent from doing irreversible damage.
The deepest structural risk is the lethal trifecta: one agent, or one server, that simultaneously can ingest untrusted content, reach sensitive data, and send data out or destroy. Any single leg is usually safe; all three together let an injected instruction in untrusted content read your secrets and ship them out. Mitigate by breaking at least one leg: isolate content-fetching tools from secret-bearing tools, gate the outbound path, and avoid loading a web-fetching server alongside a credentials server in the same agent.
Because no current model fully resists prompt injection, defense relies on capability separation, not on the agent being careful. Inventory which server contributes which leg before you deploy.
Validate inputs, harden the transport, and constrain egress
Treat every tool argument as hostile input. Validate against a strict JSON Schema, reject unexpected fields, and avoid passing model-supplied strings into shells, SQL, file paths, or HTTP requests without sanitization; command injection and SSRF are real MCP failure modes when a tool builds a system call or fetches a caller-supplied URL. Allowlist destinations for any tool that makes outbound requests so it cannot be pointed at internal metadata endpoints or arbitrary hosts.
For remote servers reached over Streamable HTTP, secure the transport: serve over HTTPS, put the server behind OAuth 2.1 or a bearer secret, expose the standard OAuth discovery metadata so clients can authenticate correctly, and scope tokens narrowly. For local stdio servers, remember the server runs with the user's privileges on their machine, so limit filesystem and network reach accordingly.
Conform to the protocol: MCP is JSON-RPC 2.0, so return spec-compliant JSON-RPC errors, keep your declared capabilities honest in the handshake (do not advertise resources or prompts you do not serve), and stay close to the current protocol version so clients are not forced into weaker behavior.
Re-audit on every release and watch for drift
A one-time review certifies a server only as it was at that moment. MCP clients fetch tool definitions live on each session and trust whatever the server returns, with no lockfile by default, so a server can silently rename a tool, rewrite a description, widen a destructive capability, or inject an instruction after you approved it. That silent change is tool drift; weaponized, it is a rug pull.
Defend with continuous re-probing rather than a single pass: capture a known-good baseline of the normalized tool set, then on each run diff the current surface against it and re-run your security checks against whatever the server now returns. Pin a reviewed tool set and alert on regression, and wire an audit into CI so a score drop or a new high-severity finding fails the build before it ships.
Apply the same discipline to third-party servers you depend on. Even individually safe servers can combine into a trifecta inside one agent, and any of them can change underneath you between releases.
How CheckMCP handles it
CheckMCP turns this checklist into a measurable, vendor-neutral MCP Score (0-100, grade A-F) for any MCP server. For live endpoints it scores seven weighted pillars, with security the top-weighted at 20 of 100 (then tool design 18, schemas and descriptions 16, reliability 14, context and token cost 12, compliance 12, and coverage or use-case 8); for repo and stdio servers it scores four pillars instead (maintenance 40, license 25, adoption 20, documentation 15). On every audit the security pass runs against the OWASP MCP Top 10, mapping directly to these practices: it flags hardcoded secrets in schemas or examples, destructive tools missing a confirmation gate, injected instructions hidden in descriptions or outputs (tool poisoning), command injection, and the lethal-trifecta capability combination. Categorical failures are enforced as hard floors: a secret found in a tool schema caps the grade at D, and a failed MCP handshake caps it at F, so a server cannot buy back a serious security failure with polish elsewhere. Beyond the static scan, behavioral evals exercise read-only tools with canary inputs to catch prompt-injection and data-exfiltration in tool responses, never invoking mutating tools. To operationalize least privilege and drift defense you run it via the open-source, MIT-licensed, stdlib-only CLI (uvx checkmcp <url>), the web app at checkmcp.dev, or the GitHub Action (uses: H129hj/checkmcp@v1) to fail a build on a score regression or a rug-pull; drift monitoring re-probes tracked servers against a baseline, and the in-band Gateway (passive and active modes) can block tool-poisoning and drift before it reaches your agent.