

Every time a Kubernetes cluster needs to spin up a new inference replica to handle a traffic spike, it pays a steep tax. A cold start means the full sequence a model server must complete before serving any request: pulling the container image, loading model weights into GPU memory, warming up CUDA kernels, compiling CUDA graphs, and registering with the service discovery layer. For large models, that bill can run into minutes. NVIDIA's answer is Dynamo Snapshot, a checkpoint/restore system that skips the entire cold-start sequence and brings a fully warm inference worker back to life in seconds.
The GPU idle problem nobody talks about
In production inference deployments, demand fluctuates over time, requiring inference replicas to scale elastically. Cold-starting inference workloads on Kubernetes can take several minutes. During that time, GPUs are allocated but idle, generating no tokens and serving no requests. That is not just wasted compute -- this delay increases the risk of SLA violations during traffic spikes, as the system cannot scale quickly enough to absorb sudden increases in demand.
NVIDIA Dynamo Snapshot is a checkpoint/restore system for AI inference workloads on Kubernetes. It serializes the full state of a running inference worker -- both GPU-side and CPU-side -- and restores it on the same or a different node, skipping the cold-start sequence entirely. The key insight is that you only need to pay the cold-start cost once. After that, every subsequent scale-out event restores from a frozen snapshot instead of booting from scratch.
Two tools, one frozen worker
A running inference worker has two distinct types of state that both need to be captured. Dynamo Snapshot uses one tool per type: cuda-checkpoint serializes GPU device state (CUDA contexts, streams, device memory, virtual address mappings) into CPU memory of the process owning each CUDA context, using the checkpointing capability of the CUDA driver. CRIU (Checkpoint/Restore in Userspace) walks Linux kernel bookkeeping and serializes the host-side process tree (CPU memory, threads, file descriptors, namespaces) to disk.
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
