For years, running a serious large language model locally meant either a beefy Linux workstation or a Mac with Apple Silicon. That changes this fall. NVIDIA unveiled NVIDIA RTX Spark, a new superchip that reinvents Windows PCs for the era of personal AI agents, offering a new class of computer that moves from tool to teammate. The announcement, made jointly with Microsoft at Computex 2026 and followed up at Microsoft Build, is the most significant shift in PC architecture in years.

What is actually inside the chip

NVIDIA is bringing two processors to market: the N1 and the higher-end N1X. The N1X features a 20-core Arm v9.2 CPU split between 10 high-performance and 10 energy-efficient cores, paired with 6,144 CUDA cores on NVIDIA's Blackwell GPU architecture. At full strength, this chip offers up to 20 Arm CPU cores, a Blackwell GPU with 6,144 CUDA cores, 128GB of LPDDR5X RAM, and up to 300 GB/s of memory bandwidth. That powerful CPU and GPU, connected over NVLink C2C, and the large memory pool give AI agents and 120-billion-parameter models plenty of power and space for long-running tasks with context lengths stretching to a million tokens, according to NVIDIA.

RTX Spark powers the world's first Windows PCs purpose-built for personal agents, featuring 1 petaflop of AI performance, industry-leading power efficiency, full-stack NVIDIA AI and graphics technology, and up to 128GB of unified memory. The unified memory architecture -- where the CPU and GPU share a single high-bandwidth pool -- is the same design principle Apple uses in its M-series chips, and it is what makes running large models on a laptop feasible.

Why CUDA changes everything here

Every other Windows-on-Arm chip -- Qualcomm Snapdragon X, AMD Strix, even Apple's M-series -- runs on proprietary AI stacks. Most laptop AI chips have capable NPUs, but they run on proprietary AI stacks like QNN, ROCm, and Core ML. CUDA tooling does not run natively on any of them. RTX Spark changes this. If your pipeline uses PyTorch with CUDA, llama.cpp with CUDA, Flash Attention, TensorRT, or the broader NVIDIA inference stack, that code runs on RTX Spark without recompilation. The same CUDA binary that runs on an H100 runs on RTX Spark.

That is a genuinely big deal. It means the entire ecosystem of CUDA-accelerated libraries -- the tools most ML practitioners already use -- just works on a laptop, without porting or rewriting. NVIDIA's CUDA ecosystem, used by millions of developers for AI, scientific computing, and creative applications, gives these laptops a built-in software advantage that Qualcomm's chips lacked at launch.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves