Google's Gemini Omni Lets You Edit Video Through Plain-English Chat

Google Gemini

Google's Gemini Omni Lets You Edit Video Through Plain-English Chat

May 20, 2026

2 min read

May 20, 2026

2 min read

Google just shipped something that video editors will have complicated feelings about. Announced at Google I/O 2026, Gemini Omni is a new model that lets you create, remix, and edit video through plain-English conversation. Not a timeline. Not keyframes. Just chat. The first model in the family, Gemini Omni Flash, went live the same day for subscribers worldwide.

Gemini Omni is a multimodal AI model designed specifically for video editing, compositing, and remixing tasks. Unlike earlier Gemini models that primarily handled video description and retrieval, Gemini Omni can actively edit and manipulate footage based on natural language instructions. That distinction matters more than it sounds. Most AI video tools today are generation-first: you write a prompt, get a clip. Gemini Omni is designed to work with video you already have, treating editing as an AI-native task rather than an afterthought.

The architecture behind it

Omni fuses three previously separate Google DeepMind systems into one architecture: Gemini (the reasoning engine, which understands language, intent, physics, culture, history, and science), Veo (the video rendering backbone, handling frame-level generation quality, motion, and resolution), and Genie (the world simulation layer, Google's game-world interactive engine that models how objects, environments, and physics behave over time).

Most earlier AI video tools followed a sequential pipeline: convert the input into text descriptions, then pass those descriptions to a separate video renderer. Gemini Omni works differently. It is built on a native multimodal model , one that processes all media types simultaneously within a single core engine rather than routing them through isolated steps. This matters because skipping conversion layers means the model retains richer context. When you supply a reference photo alongside a text prompt, Omni reasons across both at once, preserving visual details that a text-conversion step would typically flatten.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Google's Gemini Omni Lets You Edit Video Through Plain-English Chat

Takeaways

The architecture behind it

Don't miss what's next in AI