Inception just posted its first-ever go-to-market hire: a Forward Deployed AI Engineer. That might sound like a routine job listing, but for a company that has spent its entire existence as a research-first lab, it's a meaningful signal. Enterprise demand for Mercury 2 has grown faster than a small academic team can serve, and now Inception is building the customer-facing muscle to match.

The model that started it all

To understand why this hire matters, you need to know what Mercury 2 actually is. Mercury 2 is a family of large language models built by Inception Labs using a diffusion-based architecture rather than the autoregressive approach used by most modern LLMs. The distinction is fundamental.

Standard LLMs , think GPT, Claude, Gemini , generate text one token at a time, left to right, waiting for each word before producing the next. Instead of generating text one token at a time, Mercury 2 refines entire output sequences in parallel through a process adapted from image diffusion. Think of it like the difference between writing a sentence word-by-word versus sketching a rough draft and progressively sharpening it. Autoregressive models are slower because they move data through memory instead of doing math. Diffusion models focus on parallel computation, which is what GPUs were built for.

The numbers are hard to ignore

In standard benchmarks, Mercury 2 achieves approximately 1,000 tokens per second output throughput, compared with Claude 4.5 Haiku Reasoning at approximately 89 tokens per second and GPT-5 Mini at approximately 71 tokens per second. That is roughly a 10x throughput advantage over the fastest competing models.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves