NVIDIA Wires StepFun's 198B Step 3.7 Flash Into Its Full Stack

NVIDIA AI

May 29, 2026

1 min read

LLMS

long_context mixture_of_experts

INFRA

model_serving

May 29, 2026

LLMS

long_context mixture_of_experts

INFRA

model_serving

1 min read

NVIDIA has switched on Day 0 support for StepFun's newest open multimodal model, Step 3.7 Flash, meaning the model can be prototyped, deployed, and fine-tuned across NVIDIA's developer stack the moment its weights hit Hugging Face. The rollout spans GPU-accelerated endpoints on build.nvidia.com, packaged NIM inference microservices, and ready-made NeMo fine-tuning recipes, removing most of the integration work that typically follows an open model release.

A vision-language model wired into NVIDIA's stack on arrival

Step 3.7 Flash itself is StepFun's latest vision-language model. It is a 198B-parameter Mixture-of-Experts model with approximately 11B activated parameters per forward pass, optimized for agentic workflows that combine perception, search, and multi-step reasoning, and it ships with native image and video input, three configurable reasoning levels, and a 256k context window.

What NVIDIA is announcing is not the model, but the fact that its infrastructure is ready for it on day one. Developers can pull StepFun's NVFP4-quantized checkpoint from Hugging Face for boosted inference thanks to reduced memory bandwidth and storage requirements, and run it through the open-source serving stacks that NVIDIA maintains kernels for.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

NVIDIA Wires StepFun's 198B Step 3.7 Flash Into Its Full Stack

Takeaways

A vision-language model wired into NVIDIA's stack on arrival

Don't miss what's next in AI