Factory's Droid Cuts AI Agent Token Waste by 50% With Deferred Context Engine

Factory

May 20, 2026

2 min read

AGENTS

agent_frameworks code_agents tool_use

DEVELOPMENT

code_agents

May 20, 2026

AGENTS

agent_frameworks code_agents tool_use

DEVELOPMENT

code_agents

2 min read

As AI coding agents get wired into more tools -- MCP servers, internal APIs, workflow plugins -- a quiet tax has been accumulating on every single prompt. Every tool you connect comes with a schema: names, descriptions, parameter shapes, response formats. Load them all upfront, and you burn through your context window before the agent has done a single thing useful. Factory just shipped a direct answer to that problem.

The Deferred Context Engine is now live in Droid, Factory's autonomous software engineering agent. Instead of front-loading every tool schema into every prompt, Droid starts with a compact index of what's available, then pulls in the full schema only when a task actually needs it.

The tax nobody was talking about

The context bloat problem has quietly become one of the biggest friction points in production AI agent deployments. A single large MCP server can consume 10,000 to 17,000+ tokens of context per request just for tool descriptions. Stack a few of them together and things get ugly fast: in a deployment connecting GitHub, Slack, and Sentry -- just three servers totaling roughly 40 tools -- 143,000 of a 200,000-token context window were consumed by tool schemas. That's 72% of available context burned before a single user query was processed.

For enterprise teams, the numbers compound further. The official Atlassian MCP server consumes roughly 10K tokens for Jira and Confluence tools alone. The GitHub MCP server exposes 94 tools and consumes roughly 17.6K tokens. Combine several large MCP servers and you can easily spend 30K+ tokens on tool descriptions in every request. That's not overhead -- that's a product constraint.

Factory's own production data puts the enterprise picture in concrete terms. A single Droid session can connect to Sentry, Linear, GitHub, Figma, Playwright, Notion, Stripe, Vercel, Supabase, and a private internal registry. At 100+ tools, that enterprise stack carries roughly 47K schema tokens per prompt -- before any code, errors, or reasoning enter the picture.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Factory's Droid Cuts AI Agent Token Waste by 50% With Deferred Context Engine

Takeaways

The tax nobody was talking about

Don't miss what's next in AI