NVIDIA just shipped Metropolis Blueprint for Video Search and Summarization (VSS) 3, and the headline feature is deceptively simple: point a coding agent like Claude Code or Codex at a camera feed or a library of archived footage, type a natural language prompt, and the system does the rest. Under the hood, that simplicity is backed by a heavily rearchitected stack with 16 new agent skills, a unified open-source repo, and a new state-of-the-art 3D multi-camera tracking model.

The problem that needed solving

Enterprises, cities, and industrial operators are already drowning in video. Thousands of camera feeds run simultaneously, yet the moment something happens, the question is always the same: what actually occurred, and do I need to act? Traditional pipelines could store and record, but they couldn't reason. Getting insights out of recorded footage meant manual review or brittle rule-based systems that generated more false alarms than actionable alerts.

VSS is organized into three areas of processing and analysis: real-time video intelligence (feature extraction, embeddings, and stream understanding with results published to a message broker), downstream analytics (enrichment of metadata into trajectories, incidents, and verified alerts), and agentic and offline processing (orchestrated tools for search, Q&A, summarization, and clip retrieval, including via the Model Context Protocol).

16 new agent skills: the real unlock

The biggest shift in VSS 3 is how developers interact with the system. Instead of writing custom integration code, you drop a folder of skills into your coding agent and start prompting. Agent Skills are reusable, self-contained capabilities that follow the agentskills.io specification and package the prompts, reference data, and helper scripts a coding agent needs to operate a deployed VSS Blueprint. Each Skill maps a developer intent onto the corresponding VSS REST, VA-MCP, and VIOS calls.

Skills are versioned alongside the blueprint in the VSS repository, exercised by a CI eval workflow on every change, and consumed by any compatible coding agent such as Claude Code or Codex at either deployment time or runtime. The agentskills.io format matters here: as of 2026, tools that support the SKILL.md format include Claude Code, Codex CLI, Gemini CLI, GitHub Copilot, Cursor, and several community tools like Cline, Windsurf, and OpenCode.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves