
Video generation benchmarks have been around for a while, but benchmarking video editing , taking an existing clip and modifying it with a text instruction , is a different problem entirely. Artificial Analysis just launched a dedicated Video Editing Arena to fill that gap, putting six of the most capable frontier models through blind, human-preference comparisons on real editing tasks.
What the Arena actually tests
The arena is structured as a head-to-head bracket: you watch two edited versions of the same source clip, without knowing which model produced which, and vote for the better result. Rankings are derived from an Elo rating system built on user votes in blind comparisons, where users compare two outputs from the same input without knowing which model created each one. Higher Elo scores mean a model wins more often.
The editing categories are deliberately chosen to reflect where models actually differ in practice:
- Visual effects (VFX) editing
- Sound and speech editing
- Object addition, removal, and replacement
- Restyling and relighting
- Background changes
- Physics simulation
Crucially, the arena tests both with-audio and no-audio conditions , and when audio is included, you're judging the sound too, so headphones in a quiet space are recommended. This matters because audio-visual synchronization is one of the sharpest differentiators between today's frontier models.
The six models in the ring
The launch lineup covers the main contenders in the video editing space right now:
- Dreamina Seedance 2.0 , ByteDance's multimodal model. It accepts images, videos, audio, and text as inputs, giving creators control over every aspect of generation. It preserves product details, logos, and text across frames, which matters for e-commerce.
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
