MCP concepts

MCP security best practices

MCP security best practices are the defensive habits that keep an AI agent safe when it connects to Model Context Protocol servers: apply least privilege to every tool, keep secrets out of tool schemas and outputs, require explicit confirmation for destructive actions, separate untrusted-content tools from sensitive-data and outbound tools (break the lethal trifecta), validate and constrain inputs, and re-audit on every release because tool definitions can silently change. Because a server's tool descriptions and outputs flow straight into the model's context, an untrusted server is an attack surface, so the goal is to limit what any one tool, or any combination of tools, can do.

Principle of least privilege for MCP tools

Least privilege is the single most important MCP practice: each tool should expose the smallest capability that does its job, and the agent should load only the tools it actually needs. A tool named read_invoice should be able to read one invoice, not query the whole database, not write, not reach the network. Over-broad tools turn a single prompt injection into a large blast radius, so scope is your primary containment boundary.

Prefer many narrow, read-only tools over a few god-tools. Mark read-only tools with readOnlyHint and destructive tools with destructiveHint so clients can treat them correctly. Scope credentials per tool rather than handing one all-powerful token to the whole server, and bind each tool to the minimum data, table, path, or API scope it requires.

On the host and client side, least privilege means not auto-loading every server you have configured into every session. Remember the model is the protocol's three roles in motion: a host runs one MCP client per server, performs a capability handshake, then discovers and calls that server's tools, resources, and prompts. The unit of risk is the agent's full active toolset, so enabling fewer servers per task shrinks both the attack surface and the context cost.

Keep secrets out of schemas, descriptions, and outputs

Anything in a tool's name, description, default value, example, or output schema is read into the model's context and may be logged, echoed, or surfaced to the user. A hardcoded API key, token, or private key in any of those places is effectively published. Never put a real secret value in a schema example or default; use placeholders, and inject real credentials at runtime from environment variables or a secret manager.

The same applies to tool outputs at call time. A tool that returns raw secrets, full credential blobs, or unmasked PII hands that data straight to the model, and to anyone who can read the transcript. Mask or omit sensitive fields in responses, and treat the boundary between your backend and the agent as an untrusted egress point.

This is a categorical failure, not a style nit: a leaked credential in a tool schema is one of the highest-severity findings an audit can surface, and it caps an otherwise-clean server's grade.

Gate destructive actions and break the lethal trifecta

Destructive or consequential tools, such as delete, drop, send, transfer, deploy, or pay, should never fire silently on model output alone. Require an explicit confirmation parameter, a dry-run mode, or human-in-the-loop approval, and advertise destructiveHint so clients know to ask. The model can be wrong or steered; a confirmation gate is what stops a hijacked agent from doing irreversible damage.

The deepest structural risk is the lethal trifecta: one agent, or one server, that simultaneously can ingest untrusted content, reach sensitive data, and send data out or destroy. Any single leg is usually safe; all three together let an injected instruction in untrusted content read your secrets and ship them out. Mitigate by breaking at least one leg: isolate content-fetching tools from secret-bearing tools, gate the outbound path, and avoid loading a web-fetching server alongside a credentials server in the same agent.

Because no current model fully resists prompt injection, defense relies on capability separation, not on the agent being careful. Inventory which server contributes which leg before you deploy.

Validate inputs, harden the transport, and constrain egress

Treat every tool argument as hostile input. Validate against a strict JSON Schema, reject unexpected fields, and avoid passing model-supplied strings into shells, SQL, file paths, or HTTP requests without sanitization; command injection and SSRF are real MCP failure modes when a tool builds a system call or fetches a caller-supplied URL. Allowlist destinations for any tool that makes outbound requests so it cannot be pointed at internal metadata endpoints or arbitrary hosts.

For remote servers reached over Streamable HTTP, secure the transport: serve over HTTPS, put the server behind OAuth 2.1 or a bearer secret, expose the standard OAuth discovery metadata so clients can authenticate correctly, and scope tokens narrowly. For local stdio servers, remember the server runs with the user's privileges on their machine, so limit filesystem and network reach accordingly.

Conform to the protocol: MCP is JSON-RPC 2.0, so return spec-compliant JSON-RPC errors, keep your declared capabilities honest in the handshake (do not advertise resources or prompts you do not serve), and stay close to the current protocol version so clients are not forced into weaker behavior.

Re-audit on every release and watch for drift

A one-time review certifies a server only as it was at that moment. MCP clients fetch tool definitions live on each session and trust whatever the server returns, with no lockfile by default, so a server can silently rename a tool, rewrite a description, widen a destructive capability, or inject an instruction after you approved it. That silent change is tool drift; weaponized, it is a rug pull.

Defend with continuous re-probing rather than a single pass: capture a known-good baseline of the normalized tool set, then on each run diff the current surface against it and re-run your security checks against whatever the server now returns. Pin a reviewed tool set and alert on regression, and wire an audit into CI so a score drop or a new high-severity finding fails the build before it ships.

Apply the same discipline to third-party servers you depend on. Even individually safe servers can combine into a trifecta inside one agent, and any of them can change underneath you between releases.

How CheckMCP handles it

CheckMCP turns this checklist into a measurable, vendor-neutral MCP Score (0-100, grade A-F) for any MCP server. For live endpoints it scores seven weighted pillars, with security the top-weighted at 20 of 100 (then tool design 18, schemas and descriptions 16, reliability 14, context and token cost 12, compliance 12, and coverage or use-case 8); for repo and stdio servers it scores four pillars instead (maintenance 40, license 25, adoption 20, documentation 15). On every audit the security pass runs against the OWASP MCP Top 10, mapping directly to these practices: it flags hardcoded secrets in schemas or examples, destructive tools missing a confirmation gate, injected instructions hidden in descriptions or outputs (tool poisoning), command injection, and the lethal-trifecta capability combination. Categorical failures are enforced as hard floors: a secret found in a tool schema caps the grade at D, and a failed MCP handshake caps it at F, so a server cannot buy back a serious security failure with polish elsewhere. Beyond the static scan, behavioral evals exercise read-only tools with canary inputs to catch prompt-injection and data-exfiltration in tool responses, never invoking mutating tools. To operationalize least privilege and drift defense you run it via the open-source, MIT-licensed, stdlib-only CLI (uvx checkmcp <url>), the web app at checkmcp.dev, or the GitHub Action (uses: H129hj/checkmcp@v1) to fail a build on a score regression or a rug-pull; drift monitoring re-probes tracked servers against a baseline, and the in-band Gateway (passive and active modes) can block tool-poisoning and drift before it reaches your agent.

Audit an MCP server ›

MCP security best practices — FAQ

What are the most important MCP security best practices?+

Apply least privilege to every tool (smallest capability, scoped credentials, read-only by default), keep secrets out of tool schemas, descriptions, examples and outputs, require explicit confirmation for destructive actions, break the lethal trifecta by separating untrusted-content tools from sensitive-data and outbound tools, validate and constrain all inputs (no unsanitized shell, SQL or URL use), secure remote transports with HTTPS and OAuth 2.1, and re-audit on every release because tool definitions can change silently.

What is the principle of least privilege for MCP tools?+

It means each MCP tool should expose only the minimum capability needed for its job, and the agent should load only the tools a task actually requires. A read tool should not be able to write or reach the network; credentials should be scoped per tool rather than one all-powerful token; and destructive tools should be separated and gated. Narrow, least-privilege tools contain the blast radius if the agent is ever hijacked by a prompt injection.

How do I secure an MCP server I'm building?+

Keep real secrets out of schemas, defaults, examples and outputs (inject them at runtime); mark read-only tools with readOnlyHint and require a confirmation parameter plus destructiveHint on destructive ones; validate every argument against a strict JSON Schema and never pass model-supplied input into a shell, query, file path or URL unsanitized; allowlist outbound destinations; for remote servers serve over HTTPS behind OAuth 2.1 with narrowly scoped tokens; return spec-compliant JSON-RPC 2.0 errors and keep your declared capabilities honest in the handshake; and avoid bundling untrusted-content, sensitive-data and exfiltration capabilities in one server.

What are MCP security best practices for the agent, host, and client side?+

Load only the servers and tools a given task needs rather than auto-enabling everything, since the active toolset is the real attack surface and the context-cost driver. Each MCP server gets its own client and capability handshake in the host, so avoid combining a content-fetching server with a secrets-bearing server in the same agent (that assembles a lethal trifecta across servers), pin a reviewed tool set, require human approval before consequential actions, and re-audit third-party servers on every release to catch silent drift or a rug pull.

How does least privilege help against prompt injection in MCP?+

Current models cannot reliably tell trusted instructions from injected ones, so you cannot rely on the agent ignoring a malicious instruction. Least privilege limits what a hijacked agent can actually do: if the tools it holds cannot reach secrets or send data out, an injection has no payoff. That is why capability separation, breaking the lethal trifecta and scoping each tool tightly, is the practical defense rather than trying to make the model perfectly injection-proof.

How often should I audit an MCP server for security?+

On every release, and ideally continuously. A single audit only certifies the server as it was at that moment, but MCP clients fetch tool definitions live each session and trust whatever is returned, so a server can change its tools after approval (drift) or turn malicious (a rug pull). Capture a baseline, diff the tool surface on each probe, re-run the security checks against the new definitions, and wire an audit into CI so a score regression or new high-severity finding fails the build. Tools like CheckMCP automate this with drift monitoring and a GitHub Action.

MCP security Tool poisoning The lethal trifecta MCP rug pull Auditing an MCP server OWASP MCP Top 10