xAI just dropped grok-imagine-video-1.5-preview, a new image-to-video model that immediately claimed a top-tier spot on the Artificial Analysis Video Leaderboard. The model sits at #2 in the Image-to-Video (With Audio) rankings, behind only ByteDance's Seedance 2.0, and holds #3 in the no-audio rankings. For a preview-stage release, that's a strong opening statement in an increasingly crowded field.

What it actually does

The model takes a single still image and turns it into fluid, cinematic video. Give it a starting frame and a prompt describing the motion, and it animates the scene, including camera moves, atmosphere, and physics, while staying faithful to your source image. That fidelity to the source is the key design choice here: the model holds detail and lighting from the input frame, so the result continues the original image rather than reinterpreting it.

The headline capability that separates 1.5 from its predecessor is native audio. You feed it a still image plus a motion-focused prompt, and it produces a short clip with natively generated, synchronized audio, including dialogue, sound effects, ambient sound, and music, created in the same inference pass as the video, rather than bolted on afterward. That single-pass audio is a meaningful differentiator: silent AI clips increasingly feel unfinished. A product turntable needs subtle room tone or mechanical sound. A character animation needs breath, cloth movement, footsteps, or environmental ambience. A cinematic shot needs sound design that matches the visual mood.

The leaderboard picture

According to the live Artificial Analysis Image-to-Video leaderboard, the current rankings with audio look like this:

RankModelEloPrice / min
1Dreamina Seedance 2.0 720p
Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves