MCP context cost
MCP context cost is the number of tokens a server's tool list consumes in the agent's context window — paid on every request, not once. Verbose descriptions, oversized JSON schemas and too many tools can quietly eat 30–50% of the available context, leaving less room for the actual task and raising latency and price.
Why tools/list is paid on every request
Before an agent can use a server's tools, the tool definitions — names, descriptions and full input/output JSON-Schemas — are loaded into the model's context. They stay there for the model to reason over, so their token cost is incurred on essentially every turn that has the server enabled, not just the first.
Multiply that by every server an agent loads and the fixed overhead compounds: a handful of chatty servers can spend tens of thousands of tokens before the user's request is even considered.
What drives the cost
Three things dominate: the number of tools (sprawl), the verbosity of each description, and the size of the parameter and output schemas (deeply nested objects, long enums, redundant examples). A server exposing dozens of overlapping tools with paragraph-long descriptions is the worst case.
Well-designed servers consolidate related actions into fewer tools, write tight descriptions, and keep schemas lean — getting the same capability for a fraction of the tokens.
Why it matters
Context is a hard budget. Tokens spent on tool boilerplate are tokens unavailable for the user's data, the conversation, and the model's reasoning — and they add latency and cost to every call. Context bloat is one of the most common, least-measured MCP problems.
Reducing it is usually low-effort and high-impact: trim descriptions, drop redundant tools, and simplify schemas.
How CheckMCP handles it
Context-cost is one of CheckMCP's seven scored pillars. On every audit it measures the exact token cost of the server's tools/list response (reported as tools_list_tokens) and grades it on a curve calibrated against the real MCP ecosystem — so you see not just the raw number but how a server compares to typical servers, plus the causal attribution when its schemas are the reason the score dropped.