Liquid AI's LFM2.5 Beats Larger Rivals at Multilingual Search in 11 Languages

Liquid AI

1D AGO

2 min read

1 day ago

2 min read

Retrieval is one of those problems where you're always trading something. Speed costs accuracy. Accuracy costs index size. Multilingual coverage costs both. Liquid AI just released two models that try to sidestep that tradeoff: LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M, a pair of 350M-parameter multilingual retrieval models that cover 11 languages and hit sub-2ms query latency on GPU.

Two models, one decision

Both models are 350M parameters and the first bidirectional members of the LFM family, built on LFM2.5-350M-Base. They are designed for fast and reliable multilingual and cross-lingual search across 11 languages, with a footprint small enough to run almost anywhere.

The two models represent different points on the speed-vs-accuracy curve:

LFM2.5-Embedding-350M: turns each document into a single vector. Pick it when you want the fastest search and the smallest, cheapest index.
LFM2.5-ColBERT-350M: converts each token into a vector rather than a single vector per document. This lets it match queries word-by-word for higher accuracy and better generalization, at the cost of a larger index. Pick it when accuracy matters more than storage.

The ColBERT approach (late interaction) is worth understanding. Queries and documents are encoded separately at the token level. The system compares token vectors at query time using MaxSim (maximum similarity). This preserves fine-grained token interactions without the full cost of joint cross-attention, and allows pre-computation for documents. In practice, you get something closer to a reranker's precision at a bi-encoder's speed.

The architecture trick that makes it work

Both models are built from LFM2.5-350M-Base, a mid-trained general-purpose checkpoint. Liquid AI applies a small set of bidirectional patches to the LFM2 architecture, adapting it from a causal decoder to a bidirectional encoder.

This is the key architectural move. Standard LFMs are causal: each token only sees what came before it, which is great for text generation but suboptimal for retrieval. The team replaces the causal attention mask with a bidirectional one, so every token can attend to both left and right context. They also make the LFM2 short convolutions non-causal, so they mix local information symmetrically around each token rather than only from the past.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Liquid AI's LFM2.5 Beats Larger Rivals at Multilingual Search in 11 Languages

Takeaways

Two models, one decision

The architecture trick that makes it work

Don't miss what's next in AI