Nous Research's Hermes Agent Hits 4,500x Faster Memory Search in Massive Overhaul

Nous Research

May 28, 2026

2 min read

AGENTS

agent_frameworks memory tool_use

OPEN_SOURCE

May 28, 2026

AGENTS

agent_frameworks memory tool_use

OPEN_SOURCE

2 min read

Hermes Agent v0.15.0, dubbed "The Velocity Release," is the biggest single update in the project's short but explosive history. Since v0.14.0, the release spans 1,302 commits, 747 merged PRs, 1,746 files changed, and 560+ issues closed, with 321 community contributors. For a project that went from launch to roughly 180,000 GitHub stars in about three months, this release is a statement: Nous Research is treating Hermes as serious infrastructure, not a demo.

The monolith falls

The centerpiece of this release is a refactor that has been a long time coming. The 16,083-line run_agent.py collapses to 3,821 lines (-76%) across 14 cohesive agent/* modules. That single file was the agent's entire conversation loop , the thing that runs on every single turn. At 16,000 lines, it was nearly impossible to grep, nearly impossible to contribute to, and took 90 seconds to load in most editors.

The refactor is behavior-preserving: every extraction keeps a thin forwarder on AIAgent, every test patch path still works, and every external caller stays compatible. But the downstream effects are real. The run_agent.py refactor is a maintainability win first , but it also unlocked the performance work that defines this release, and it's what made the new desktop app (shipped days later) architecturally feasible.

Speed, everywhere

The performance gains in v0.15.0 are specific and measurable, not vague claims. Four separate optimization passes landed:

Cold start: Another second shaved off launch, 47% fewer per-conversation function calls, with hermes --version flipping the head-to-head benchmark against Codex CLI. Specifically, Termux cold start drops from 2.9s to 0.8s, and hermes --version wall time drops 63% from 701ms to 258ms.
Per-turn overhead: Deferring the openai._base_client import saves 240ms and 17MB on every CLI invocation. Adaptive subprocess polling cuts ~195ms per tool call, adding up to over a second per turn.
Memory search: session_search is roughly 4,500x faster (~90s to ~20ms) and now free. The old version called an auxiliary LLM to summarize sessions, costing ~$0.30 per call and sometimes hallucinating results when the right session wasn't even in the hit list.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Nous Research's Hermes Agent Hits 4,500x Faster Memory Search in Massive Overhaul

Takeaways

The monolith falls

Speed, everywhere

Don't miss what's next in AI