
Google DeepMind has introduced D4RT, short for Dynamic 4D Reconstruction and Tracking, a single transformer that pulls geometry, motion, and camera parameters out of an ordinary video in one shot. In testing, it performed 18x to 300x faster than the previous state of the art. The work is being presented at CVPR 2026 and the technical report is on arXiv.
The pitch is simple: instead of stitching together a depth estimator, a point tracker, and a pose solver, you get one feedforward model that answers a single, very general question about any pixel at any time from any viewpoint. That reframing is what unlocks both the speed and the accuracy gains.
One question to rule four dimensions
D4RT operates as a unified encoder-decoder Transformer architecture. The encoder first processes the input video into a compressed representation of the scene's geometry and motion. Unlike older systems that employed separate modules for different tasks, D4RT calculates only what it needs using a flexible querying mechanism centered around a single, fundamental question: "Where is a given pixel from the video located in 3D space at an arbitrary time, as viewed from a chosen camera?"
That phrasing matters because it collapses three traditionally separate computer vision tasks into the same interface. You parameterize a query by a source pixel (u, v), a source timestep t_src, a target timestep t_tgt, and a target camera t_cam, and the decoder returns the 3D position. The query also carries a local image patch around the pixel for extra spatial context.
The key engineering insight is that queries are independent of each other. Because queries are independent, they can be processed in parallel on modern AI hardware. This makes D4RT extremely fast and scalable, whether it's tracking just a few points or reconstructing an entire scene. That sidesteps the dense per-frame decoding that bogs down most prior 4D systems, which have to commit to outputting a full depth map or flow field whether you need every pixel or not.
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves

