Anthropic's Claude Fable 5 Beats Every Rival but Blocks Itself on Dangerous Requests

Artificial Analysis

Anthropic's Claude Fable 5 Beats Every Rival but Blocks Itself on Dangerous Requests

6D AGO

2 min read

LLMS

hallucinations long_context vision_language

BENCHMARKS

6 days ago

LLMS

hallucinations long_context vision_language

BENCHMARKS

2 min read

Two months after Anthropic quietly unveiled Claude Mythos to a handful of partners and then quietly panicked about what it had built, the company is ready to let everyone use it. Anthropic launched Claude Fable 5, the first publicly available version of its Mythos model, which excels at software engineering, knowledge work, and vision but comes with hard safety limits. The catch: in high-risk areas like cybersecurity, biology, chemistry, and distillation, the model blocks responses and falls back to Claude Opus 4.8.

Anthropic initially revealed its Claude Mythos Preview in April, noting that it proved particularly adept at finding vulnerabilities across every major operating system and web browser, despite not being designed for cybersecurity. Launched as a preview in April, Mythos was initially limited to a handful of partners due to cybersecurity concerns. Last week, Anthropic expanded access to hundreds of organizations across 15 countries, again focusing on organizations that manage critical infrastructure. Now, a version of that same underlying model is available to anyone.

The safety architecture that makes this possible

The key innovation that unlocks public access is a new classifier-based safety system. Claude Fable 5 represents a "significant jump" in capability, which is why Anthropic had to implement additional guardrails to prevent misuse. If a user asks a high-risk question, like how to make ricin, the model will block its response and fall back to Claude Opus 4.8 to deliver a safe answer.

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

Anthropic's Claude Fable 5 Beats Every Rival but Blocks Itself on Dangerous Requests

Takeaways

The safety architecture that makes this possible

Don't miss what's next in AI