AI can generate audio from scratch with impressive fidelity. But can it actually edit existing audio -- changing only what you ask while leaving everything else untouched? A new benchmark from Tencent Hunyuan and collaborators across SJTU, NTU, PKU, FDU, and others says the answer is: barely. MMAE (Massive Multitask Audio Editing Benchmark) is the first comprehensive evaluation testbed for instruction-based audio editing, and its headline finding is damning -- current models achieve an Exact Match Rate (EMR) below 5%.

Generation is not editing

The distinction matters enormously in practice. Audio generation means producing something new from a text prompt. Audio editing means taking a real clip -- a podcast recording, a music track, a sound effect reel -- and making a targeted modification based on a natural language instruction, while preserving everything else. Think: "Remove the background music but keep the speech," or "Replace the word 'cold' with emphasis on it," or "Extract only the guitar from this mix."

Despite progress in the field, there has been no unified standard for evaluating audio editing tasks, and the absence of a widely accepted open-source benchmark has led to fragmented evaluation protocols and a lack of reliable objective metrics. The most commonly used metric has been the CLAP Score, which measures similarity between the edited audio and the target prompt in an embedding space. That proxy metric tells you almost nothing about whether untouched regions were actually preserved -- which is the whole point of editing.

What MMAE actually tests

MMAE is the first comprehensive evaluation testbed designed for general-purpose instruction-based audio editing, extending to a broad spectrum of real-world scenarios across 7 distinct audio modalities including sound, speech, music, and their mixtures. The benchmark was built through a systematic five-stage pipeline: brainstorming, taxonomy construction, instruction-centric data collection, rubric annotation, and quality inspection -- all done through human-agent collaboration.

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves