Frontier AI models are expensive, and using them for every line of code is starting to hurt. Cognition just shipped Devin Fusion, a hybrid-model harness built into their Devin coding agent that keeps frontier-level intelligence while cutting costs by 35% on real engineering tasks. The trick is not simpler routing logic -- it is a fundamentally different architecture that runs two agents in parallel and switches models mid-task without blowing up the token cache.

The problem with model routing today

The standard approach to reducing AI inference costs is model routing: classify the task upfront, send easy work to a cheap model, and reserve the expensive frontier model for hard tasks. It sounds reasonable, but it breaks down in practice for two reasons.

  • Tasks reveal their difficulty late. A prompt that looks like a simple bug fix can turn into a multi-file race condition investigation three steps in. A router that commits to a cheap model at the start has no way to recover.
  • Cache misses are brutal. Switching between models mid-session means the new model has no cached context. Cache hit tokens cost 10x less than cache miss tokens -- on Opus, that is $1.50/MTok cached versus $15/MTok uncached. Every cold-start switch is a significant cost spike.

Existing workarounds like the "Smart Friend" or "Advisor" patterns -- where one model queries another for advice -- hit the same wall. Upon every call to the other model, the context for the task is not shared in a way that is cached, and you pay a very expensive price. Devin Fusion is built specifically to avoid this.

The sidekick: two agents, one brain

The core idea behind Fusion is running two fully capable agents simultaneously: a frontier main agent and a smaller, cheaper "sidekick" agent. Both have their own toolsets, their own shell access, and their own persistent cached context. The frontier agent acts as a tech lead -- it plans, handles ambiguity, and does final review. The sidekick does the execution.

As the task progresses, the main agent decides which tasks to give the sidekick and which tasks to do itself. The main agent should take minimal actions, and only read what is absolutely necessary. By default it should delegate and monitor, while making the significant decisions: the plan, the interpretation of ambiguity, the final review.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves