
Warp just shipped one of its most-requested features: the ability to plug any OpenAI-compatible inference endpoint directly into the Warp Agent. Set an endpoint URL, give it a memorable alias, and switch between models mid-session with /model [alias]. No more being locked to whatever models Warp bundles by default.
The problem this actually solves
Warp's agent is deeply integrated with your terminal state: it reads your indexed codebase, environment variables, MCP servers, rules, and notebooks without any extra wiring. That's the part developers want to keep. What they didn't want was having no say in which model sits underneath it all.
Custom endpoints are for developers who want control over inference, including the ability to experiment with providers and models Warp does not natively support yet. Bundled inference remains available for developers who want Warp to handle model access and infrastructure for them.
What you can actually plug in
You can now power Warp's agent experience with your own OpenAI, Anthropic, or Google API key, or connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, an internal gateway, or your own hosted inference setup.
Anything that speaks the OpenAI Chat Completions API (POST /v1/chat/completions) is fair game.
- OpenRouter , connects your OpenRouter URL directly to the Warp Agent, providing immediate access to models from Anthropic, OpenAI, and DeepSeek without leaving the command line.
- LiteLLM , a self-hosted proxy that exposes a unified OpenAI-compatible API across dozens of providers.
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
