Nous Research Drops Step 3.7 Flash Free Inside the World's Fastest-Growing Agent

Nous Research

May 30, 2026

2 min read

LLMS

mixture_of_experts vision_language

API

May 30, 2026

LLMS

mixture_of_experts vision_language

API

2 min read

Nous Research just handed its Hermes Agent users a significant free upgrade: Step 3.7 Flash, the newest model from Chinese frontier lab StepFun, is now available at no cost for 30 days through Nous Portal. For an agent platform that has been quietly becoming the most-used app on OpenRouter, this is a meaningful model drop , and it signals a growing cross-lab partnership between two of the most interesting teams in open AI right now.

What just landed

Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts (MoE) vision-language model. It pairs a 196B-parameter language backbone with a 1.8B-parameter Vision Transformer encoder for native image understanding, and activates approximately 11B parameters per token during inference. That last number is the key one. In MoE architectures, only a subset of "expert" sub-networks fires per forward pass , not the full network. This keeps inference compute closer to an 11B dense model while maintaining a 198B total parameter budget.

Engineered for high-frequency production workloads, it delivers throughput of up to 400 tokens per second and supports a 256K context window. It also offers three selectable reasoning levels , low, medium, and high , so developers can easily balance speed, cost, and cognitive depth.

Built for agents, not just chat

This model was not designed to win chatbot leaderboards. It is designed to handle intensive tasks such as parsing massive financial reports in one pass, running multi-step search loops with cross-source verification, or operating concurrent coding agents in high-throughput pipelines. StepFun also baked in a novel "Advisor Mode" , the model runs the agentic loop end-to-end, calling tools, reading results, and iterating, then escalates to a larger advisor model only at specific inflection points like planning or recovering from repeated failures. Most of the run stays at executor cost. With Advisor Mode enabled on SWE-Bench Verified, StepFun reports Step 3.7 Flash reaches 97% of Claude Opus 4.6's coding performance at roughly one-ninth the per-task cost.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Nous Research Drops Step 3.7 Flash Free Inside the World's Fastest-Growing Agent

Takeaways

What just landed

Built for agents, not just chat

Don't miss what's next in AI