Hermes Agent, the open-source autonomous agent from Nous Research, just shipped a major overhaul to how it reads the web. The headline numbers are striking: up to 60x faster and 49x cheaper web extraction. But the real story is in the engineering decisions that got there.

The problem with reading the web at scale

When an agent browses the web, the naive approach is to fetch a page, dump the full HTML or markdown into the model's context window, and let it figure things out. That works fine for a short blog post. It falls apart fast when you hit a documentation site, a forum thread, or a news article with embedded comments , pages that can balloon to hundreds of thousands of characters.

The old behavior meant two things: slow round-trips because the agent was processing redundant content, and expensive LLM calls because every token in that bloated page costs money. If you're running an agent that browses dozens of pages per task, this compounds quickly.

What changed under the hood

The update targets two bottlenecks directly:

  • Cleaner scraping backends , providers like Firecrawl now pass clean, pre-processed content straight to the agent, cutting out redundant intermediate steps that were adding latency without adding value.
  • On-demand page paging , large pages are saved locally and served to the agent in chunks as needed, rather than being loaded into context all at once. Same quality, fraction of the cost.
Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves