EDITORIAL LEADERBOARD

/

Page Hero Background

#14

Artificial Analysis

Independent benchmarking platform for LLMs and inference API providers. Tracks quality, throughput, latency, and cost across proprietary and open-weight models. Publishes the Intelligence Index, a composite of evaluations including GPQA Diamond, Humanity's Last Exam, SciCode, and Terminal-Bench Hard. Covers reasoning vs. non-reasoning models, context window comparisons, and per-token pricing.

Topics

Subtopics

VISION LANGUAGEHALLUCINATIONSLONG CONTEXT

Links

LAST 30 DAYS

LAST 30 DAYS

Artificial Analysis Moonshot AI's Kimi K3 Drops 2.8 Trillion Parameters to Beat OpenAI and Anthropic

Moonshot AI's Kimi K3 Drops 2.8 Trillion Parameters to Beat OpenAI and Anthropic

Artificial Analysis

4 days ago

4D AGO

Artificial Analysis Google's Gemini Omni Flash Tops Video Leaderboards With Built-In Reasoning

Google's Gemini Omni Flash Tops Video Leaderboards With Built-In Reasoning

Artificial Analysis

7 days ago

7D AGO

Artificial Analysis China Mobile's JT-4.1 Flash Challenges DeepSeek With a Free 236B Model

China Mobile's JT-4.1 Flash Challenges DeepSeek With a Free 236B Model

Artificial Analysis

Jul 11

Jul 11

Artificial Analysis Artificial Analysis Reveals Top AI Agents Fail Half of Real Office Tasks

Artificial Analysis Reveals Top AI Agents Fail Half of Real Office Tasks

Artificial Analysis

Jul 09

Jul 09

Artificial Analysis xAI's Grok 4.5 Tops SaaS Automation Benchmark at a Quarter the Cost

xAI's Grok 4.5 Tops SaaS Automation Benchmark at a Quarter the Cost

Artificial Analysis

Jul 09

Jul 09

Artificial Analysis Google DeepMind's Gemini 3.1 Flash Lite Image Generates Photos 2.7x Faster

Google DeepMind's Gemini 3.1 Flash Lite Image Generates Photos 2.7x Faster

Artificial Analysis

Jul 08

Jul 08

Artificial Analysis Artificial Analysis Fixes the Benchmark That Let Bad TTS Models Win

Artificial Analysis Fixes the Benchmark That Let Bad TTS Models Win

Artificial Analysis

Jul 08

Jul 08

Artificial Analysis Claude Fable 5 Tops Legal AI Benchmark but Fails 86% of Real Tasks

Claude Fable 5 Tops Legal AI Benchmark but Fails 86% of Real Tasks

Artificial Analysis

Jul 07

Jul 07

Artificial Analysis Artificial Analysis Releases 6 Industry Indices That Reshape How Developers Pick AI Models

Artificial Analysis Releases 6 Industry Indices That Reshape How Developers Pick AI Models

Artificial Analysis

Jul 07

Jul 07

Artificial Analysis AssemblyAI's Universal-3.5 Pro Gives Voice Agents Memory Between Turns

AssemblyAI's Universal-3.5 Pro Gives Voice Agents Memory Between Turns

Artificial Analysis

Jul 06

Jul 06