
Runway just added Seed Audio 1.0 to its platform for all paid subscribers. The model, built by ByteDance's Seed team, is not another text-to-speech tool. It is a full audio scene generator: one prompt, and you get dialogue, background music, ambient sound, and foley-style effects all mixed together and ready to use.
Seed Audio 1.0 was launched at the Volcano Engine FORCE 2026 conference. Its arrival on Runway brings it to a much wider creative audience, integrated directly into the same workspace where users already generate video.
One prompt, an entire soundscape
The core idea is a shift in what the model's output unit is. Seed Audio 1.0 points to a broader category: prompt-directed audio production, where a model can generate dialogue, speaker roles, emotion, background music, ambience, and sound effects as one coherent audio scene. That is a fundamentally different contract than what TTS tools offer.
What makes Seed Audio 1.0 genuinely interesting is that it does two hard things at once. It generates net-new, film-quality audio from a text prompt, and it repairs and reshapes audio you already have, filling silent gaps, swapping lines, extending clips, and generating alternate endings. It is a cinematic audio generator and an audio editor in a single model, which is rare.

What it can actually do
Key capabilities include zero-shot voice cloning from short reference clips, multi-character dialogue generation in a single pass with distinct voices per speaker, simultaneous generation of voice and background music and sound effects, and cross-lingual synthesis without fine-tuning.
The model accepts three types of input to guide its output:
- Text prompts , describe the scene, characters, emotion, location, and sound design in natural language
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
