Sakana AI Unifies Two Rival AI Optimization Camps to Slash LLM Merging Costs

Sakana AI

4H AGO

2 min read

LLMS

TRAINING_INFRA

pretraining scaling_laws

4 hrs ago

LLMS

TRAINING_INFRA

pretraining scaling_laws

2 min read

When you train a neural network, you usually have gradients to guide you. But a huge class of real problems -- tuning a robot in a physics simulator, querying a closed external API, searching for the best merge of two language models -- gives you nothing but a score. No gradients, no direction, just a number. That is the domain of Black-Box Optimization (BBO), and for decades it has been split into two camps that barely talked to each other. Sakana AI's new ICML 2026 paper, "Bridging Spherical Black-Box Optimizers", argues those camps were solving the same equation all along.

Two tribes, one equation

The BBO field has historically divided into two families:

Parametric methods -- algorithms like Evolution Strategies (ES) and CMA-ES maintain a probability distribution over candidate solutions and update its parameters each step. They scale gracefully to thousands of dimensions but converge to a single solution, which means they can get stuck in local optima.
Nonparametric (particle-based) methods -- algorithms like Consensus-Based Optimization (CBO) maintain a swarm of individual candidate solutions ("particles") that vote on where to move next. They can find multiple distinct solutions simultaneously, but they fall apart in high dimensions because the particles spread too thin.

The tradeoff felt fundamental. If you needed to search a high-dimensional space, you used ES. If you needed to escape local optima and find multiple good solutions, you used CBO. Nobody had a principled way to get both at once.

While Evolution Strategies, Consensus-Based Optimization, Optimization via Integration (OVI), and related methods have each been studied independently, their connections remained underexplored. The Sakana team unified these approaches within a common theoretical framework, revealing that they differ primarily in two design choices: fitness aggregation (controlling how sharply the algorithm focuses on the best candidates) and consensus scope (controlling whether the algorithm seeks one solution or many).

What the unification actually means

Think of it this way: every algorithm in both families is doing a weighted average of candidate solutions, then adding noise. The parametric methods do this averaging globally across all particles (one consensus), while particle-based methods do it locally per particle (many consensuses). The fitness aggregation knob controls how aggressively the algorithm ignores bad candidates -- turn it up and you get sharp, fast convergence; turn it down and you get a flatter, more exploratory search.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Sakana AI Unifies Two Rival AI Optimization Camps to Slash LLM Merging Costs

Takeaways

Two tribes, one equation

What the unification actually means

Don't miss what's next in AI