Perplexity's CTO Declared They're Leaving MCP — Here's What 143K/200K Token Overhead Means

At the Ask 2026 conference on March 11, Perplexity CTO Denis Yarats announced that Perplexity is moving away from MCP — the Model Context Protocol that’s become the default standard for LLM tool integration. The stated reason is precise: MCP’s tool metadata overhead consumes 143K of a 200K context window before the model processes a single line of actual content.

That’s 72%. And it’s a real number worth understanding.

The MCP Token Overhead Problem

MCP works by describing available tools to the model in the system prompt or context. Each tool description includes its name, purpose, parameter schema, and usage instructions. These descriptions need to be comprehensive enough for the model to use the tools correctly. The problem is comprehensiveness has a cost.

If you have dozens of integrated tools — search, code execution, browser, file system, calendar, email, database, external APIs — the cumulative schema size grows quickly. Perplexity operates at significant integration breadth. At their scale of tools, the math produces exactly what Yarats described: 143K tokens consumed before the model sees the user’s question.

This creates two compounding problems:

Effective context shrinks. The model has less room for conversation history, retrieved documents, and intermediate reasoning. Complex multi-turn interactions that require holding significant context are constrained by what should be invisible infrastructure.

Reasoning quality degrades. LLMs performing worst in long-context scenarios isn’t uniformly distributed — it’s the content in the middle of the context window that suffers most from attention dilution. If your first 143K tokens are tool schemas, your actual content is starting in the worst possible position relative to model attention patterns.

What Perplexity Replaced It With

Their Agent API takes the opposite architectural approach: single endpoint, single API key, and internal tool execution that doesn’t require exposing tool schemas to the context window. The tools are invoked without the model needing to see their full descriptions — the routing and parameter handling happens outside the model’s attention window.

This trades flexibility for efficiency. MCP’s strength is that it’s a standardized description format that any model can interpret — you can add new tools without model-specific integration, and the model can reason about which tool to use based on the description. The Agent API approach requires the routing logic to live somewhere else, whether in the application layer or in a purpose-built orchestration system.

For Perplexity’s specific use case — a product with a well-defined, stable tool set — the tradeoff makes sense. The flexibility cost is low because they’re not dynamically adding tools at runtime. The efficiency gain is large because they operate at scale where the token overhead compounds across millions of queries.

MCP’s Actual Problem

Yarats’ phrasing at Ask 2026 — “MCP is not dead, but it needs to evolve” — frames this as a design criticism, not a rejection of the underlying concept. The concept (standardized tool description for LLMs) is valuable. The current implementation has two fixable problems:

Token bloat. Tool descriptions as they’re currently structured are verbose. A 40-50% overhead on tool metadata is an engineering problem, not a fundamental limitation of the protocol. Compressed schema formats, lazy loading (describe only the tools relevant to the current query), and hierarchical tool organization could substantially reduce the footprint.

Authentication friction. MCP’s current authentication model requires per-tool setup. This is manageable for a small number of integrations and a significant operational burden for large deployments. Unified credential management would reduce this friction without requiring protocol abandonment.

Both problems have known solutions. The question is whether the MCP specification evolves fast enough to address them before alternatives capture the market.

The MCPAgentBench Context

Separately, MCPAgentBench (arXiv, December 2025) introduced a benchmark specifically for evaluating LLM performance on MCP-mediated tool use in real-world scenarios. The existence of this benchmark matters because it creates a measurement framework — tool use efficiency can now be compared across different architectures and implementations.

Early results from MCPAgentBench suggest that model performance on tool-use tasks varies significantly with context load, confirming the practical importance of the overhead problem Perplexity identified. As this benchmark matures, it will likely drive protocol optimization in the same way SWE-Bench has driven improvements in code generation capability.

What This Means for Your Architecture

If you’re building LLM applications with tool integrations, the MCP overhead problem is directly relevant to your design choices:

Audit your tool metadata footprint. Count the tokens your current tool schemas consume. If you’re above 20-30% of your context budget, you have a token efficiency problem that will compound at scale.

Consider dynamic tool discovery. Rather than loading all tool schemas on every request, implement selective loading — include only the tools relevant to the current query type. This requires a lightweight classification step but reduces context load substantially for specialized queries.

Design for eventual migration. Whether MCP evolves to address these problems or is partially displaced by more efficient alternatives, the applications with clean abstraction layers between their tool integration and their LLM interface will be easier to update. Tight coupling to MCP’s current schema format creates migration debt.

The broader point: standardization creates interoperability but standardization on a flawed design creates interoperability around the flaw. MCP is at the point where the community needs to decide whether to fix it or fragment.

Source: Versalence — Long Live MCP: Why MCP Is Facing an Evolution in 2026