/


#96
Inception
LLM startup competing in the fast-inference space. Inception builds diffusion-based language models (dLLMs) that generate tokens in parallel rather than one at a time, hitting 1,000+ tokens per second on standard NVIDIA GPUs. Its Mercury model family targets coding and reasoning workloads, with an OpenAI-compatible API.
Links
