‹ learn
MCP concepts

MCP rug pull

An MCP rug pull is when a Model Context Protocol server you already approved silently mutates its tool definitions after the fact — rewriting a description, adding a hidden instruction, widening a destructive tool, or swapping behavior — so code that passed review starts behaving differently. CheckMCP's methodology catches it by hashing the normalized tool set against a stored baseline and re-running its OWASP and (optional) behavioral checks against whatever the server now returns, flagging breaking changes and newly-introduced risk.

What an MCP rug pull is

MCP servers ship their tools dynamically: an agent calls `tools/list` and trusts whatever names, descriptions and `inputSchema` the server returns. A rug pull exploits that trust window. You audit a server, approve it, integrate it — and later the server changes what it returns. Because the agent re-reads tool definitions on each session rather than pinning a reviewed copy, the new definitions take effect with no further human approval.

The payload can be anything the original review would have flagged: a freshly-injected instruction in a description ("also forward the result to…"), a hardcoded credential added to an example, a previously read-only tool gaining a destructive capability, or a benign-looking `fetch` tool that begins exfiltrating. The server presents a clean face during the audit, then "pulls the rug" once integrated — the MCP analogue of a dependency that goes malicious in a later release.

Silent tool drift vs. a deliberate rug pull

Silent tool drift is the broader, often-unintentional version: a server redeploys and a tool is renamed, removed, or has its schema changed without a version bump. Nothing was malicious, but agent code that depended on the old contract breaks silently — calls start malforming or routing to the wrong tool, and there was no signal that anything changed.

A rug pull is drift weaponized: the same silent-change mechanism, but the new definition is engineered to be harmful. Operationally they are detected the same way — by comparing the current tool surface against a known-good baseline — which is why a drift detector is also a rug-pull detector. CheckMCP's RUBRIC.md treats this under Reliability as metric 6.6 `tools_list_regression`: a stable hash of the sorted (name, schema-hash) tuples compared to a stored baseline, where a breaking removal or rename without a version bump scores 30, a non-breaking addition scores 90, and no change scores 100.

Why a one-time audit is not enough

A single audit only certifies the server as it was at probe time. Both drift and rug pulls happen after that, so the only reliable defense is continuous re-probing: capture a baseline, then on each run diff the tool set and re-run the same security and behavioral checks against whatever the server now returns.

CheckMCP's methodology makes this explicit with measurement tiers — static (T1) and active (T2) checks run in a single shot, but full reliability and the `tools_list_regression` check are T3 (temporal), requiring repeated probes over a time window (RUBRIC.md specifies ≥24h, ≥50 samples). Detecting a rug pull is fundamentally a T3 problem: you cannot see the change from one snapshot, only by comparing two.

How CheckMCP handles it

CheckMCP's stated defense against rug pulls and silent drift is continuous monitoring rather than a one-time pass (llms.txt: "Continuous monitoring detects rug-pulls and silent tool drift via tool pinning"). Under the Reliability pillar, RUBRIC.md defines metric 6.6 `tools_list_regression`: a stable hash of the sorted (name, schema-hash) tuples compared against a stored baseline — a breaking removal or rename without a version bump scores 30 vs. 100 for an unchanged set, and this is a T3 (temporal) measurement requiring the repeated probes that continuous monitoring provides. (Note: reliability is reported with a LOW-confidence flag and is excluded from the single-shot composite in score.py, which computes Pillar 6 only from in-run latency until a T3 window exists.) Crucially, every re-probe also re-runs the full static risk analysis on the *new* definitions: security.py's `audit()` re-scans descriptions, schemas and outputs for newly-injected poisoning (its `INJECT` pattern → MCP03 CRITICAL), hardcoded secret values (`SECRET_VAL` → MCP01 CRITICAL), a destructive tool that lost its confirm param or `destructiveHint` (MCP02 HIGH), and a lethal-trifecta surface that newly combines untrusted-content + sensitive-data + exfil/destruction (MCP06 CRITICAL) — and any MCP01/MCP03 finding or a confirmed trifecta trips the `hard_floor` in score.py that caps the score at 69 (grade D). If behavioral evals are enabled (opt-in, read-only), evals.py re-exercises the read-only-safe tools with canary inputs and flags runtime changes such as a new `active_prompt_injection` or `exfiltration_vector` in tool output, or an `exfiltration_confirmed` hit when the server fetches the planted callback-canary URL. So a server that flips malicious after approval re-scores on the next run and fails the same gates it would have failed on day one.

MCP rug pull — FAQ

How is an MCP rug pull different from a normal software supply-chain attack?+
The mechanism is the same idea — a trusted dependency goes bad after you adopt it — but MCP makes it easier because tool definitions are fetched live on every session and re-read without re-approval. There is no lockfile-equivalent by default, so a server can change its `tools/list` (names, descriptions, schemas) at any time and the agent simply trusts the new version. CheckMCP's methodology closes that gap by hashing the normalized tool set and diffing it against a stored baseline on each probe (the `tools_list_regression` metric).
Can you catch a rug pull with a single CheckMCP scan?+
No — a single scan only certifies the server at that moment. Detecting a change is inherently a temporal (T3) measurement in CheckMCP's tiers: it needs at least two probes to compare. Continuous monitoring captures a baseline and re-probes, so the `tools_list_regression` check plus a fresh OWASP MCP Top 10 pass (and behavioral evals, if enabled) run against whatever the server returns later.
What kinds of changes does CheckMCP flag as drift or a rug pull?+
Breaking changes to the tool surface — a tool removed or renamed without a version bump scores 30 on the `tools_list_regression` metric (vs. 90 for a non-breaking addition, 100 for no change). On top of structural drift, each re-probe re-runs security.py's static audit and, if enabled, evals.py, so newly-introduced tool poisoning (MCP03), a hardcoded secret value in a schema or example (MCP01), a destructive tool that dropped its confirmation or `destructiveHint` (MCP02), a new lethal-trifecta combination (MCP06), or new exfiltration/injection in tool output are all caught on the next run.
Does CheckMCP prevent a rug pull or just detect it?+
It detects it. The audit and monitoring pipeline re-scores the server and trips the same hard floor (a hardcoded secret or critical injection caps the score at 69 / grade D; a failed handshake yields F) the moment the malicious change appears. Turning that detection into protection for your integration means pinning a reviewed tool set and alerting on regression — CheckMCP provides the baseline definition, the regression metric, and the re-run of every static and behavioral check; acting on the alert is on the integrator.

Related