

The prevailing assumption in image generation has been simple: if you want quality, you need a server. PrismML just made a serious case against that. The Pasadena-based startup, founded by Caltech researchers and backed by Khosla Ventures, has released Bonsai Image 4B, a family of compressed text-to-image models that run entirely on-device, from Apple Silicon Macs down to iPhones, with no API calls and no subscription required.
A gigabyte of image generation
The headline number is striking. The 1-bit variant compresses the diffusion transformer to 0.93 GB, an 8.3x reduction from the full-precision model. The ternary variant compresses it to 1.21 GB, a 6.4x reduction from the full-precision transformer. For context, the original FLUX.2 Klein 4B that Bonsai is built on requires a 7.75 GB diffusion transformer alone.
The two variants are designed for different priorities:
- 1-bit Bonsai Image 4B uses binary weights
{-1, +1}with an FP16 group-wise scaling factor, giving 1.125 effective bits per weight. It is the maximum-compression option, built for the tightest memory budgets. - Ternary Bonsai Image 4B uses weights
{-1, 0, +1}with FP16 group-wise scaling, giving 1.71 effective bits per weight. The extra zero state adds representational flexibility, recovering visual quality and prompt fidelity at the cost of a slightly larger footprint.
How it actually works
Quantization, the process of replacing high-precision floating-point weights with lower-precision representations, is not new. INT8 and INT4 quantization are standard tools. But 1-bit quantization is an entirely different level of aggression: every weight in the transformer is reduced to just two possible values. Instead of preserving exact precision, the model preserves structure. And surprisingly, diffusion transformers seem capable of surviving that tradeoff much better than people expected.
Bonsai Image 4B is built from the FLUX.2 Klein 4B. It keeps the architecture intact but changes how the transformer weights are represented. By moving those weights into binary and ternary form, Bonsai reduces the part of the image pipeline that matters most for local deployment. Critically, a small set of precision-sensitive supporting tensors (~5%), called the projection layers, remains in FP16, so the final 1-bit Bonsai Image 4B transformer is 0.93 GB: an 8.3x reduction from the 7.75 GB full-precision FLUX.2 Klein 4B.
Don't miss what's next in AI
Join 300,000+ engineers and researchers who get the signal, not the noise.
- Full access to in-depth AI research breakdowns
- Be the first to know what's trending before it hits mainstream
- Daily curated papers, repos, and industry moves
