Liquid AI's LFM2.5 Beats 7.7B Japanese Voice Models at 1.5B Parameters

Liquid AI

Jun 06, 2026

2 min read

AUDIO

speech_to_text text_to_speech

LLMS

small_models

Jun 06, 2026

AUDIO

speech_to_text text_to_speech

LLMS

small_models

2 min read

Liquid AI just dropped two new Japanese-language models, and the headline number is hard to ignore: a 1.5B-parameter audio model that outperforms a 7.7B-parameter competitor on conversational Japanese benchmarks. The release covers two distinct models , LFM2.5-Audio-1.5B-JP, an end-to-end speech model, and LFM2.5-1.2B-JP-202606, an updated general-purpose Japanese chat model , both available now on Hugging Face.

One Model to Rule ASR, TTS, and Chat

The audio model is the more novel of the two. Most Japanese voice pipelines today are assembled from separate components: an ASR model to transcribe speech, a language model to generate a response, and a TTS model to speak it back. LFM2.5-Audio-1.5B-JP is an end-to-end multimodal speech and text language model that does not require separate ASR and TTS components. You speak in, it speaks back , one model, one forward pass.

Designed with low latency and real-time conversation in mind, at only 1.5 billion parameters LFM2.5-Audio-JP enables seamless Japanese conversational interaction, achieving capabilities on par with much larger models. That last part is the key claim: the model punches well above its weight class.

The Architecture Under the Hood

The model consists of a pretrained LFM2.5 model as its multimodal backbone, along with a FastConformer-based audio encoder to handle continuous audio inputs, and an RQ-transformer generating discrete tokens coupled with a lightweight audio detokenizer for audio output. FastConformer is a convolution-augmented transformer architecture originally developed by NVIDIA for efficient speech encoding. The RQ-transformer (Residual Quantization transformer) handles converting the language model's outputs into discrete audio codes, which are then decoded into a waveform.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Liquid AI's LFM2.5 Beats 7.7B Japanese Voice Models at 1.5B Parameters

Takeaways

One Model to Rule ASR, TTS, and Chat

The Architecture Under the Hood

Don't miss what's next in AI