EDITORIAL LEADERBOARD

/

Page Hero Background

#54

vLLM

Open-source LLM inference and serving engine, originated at UC Berkeley's Sky Computing Lab. Built around PagedAttention for efficient KV cache memory management, with continuous batching, tensor and pipeline parallelism, and quantization support (FP8, GPTQ, AWQ). Supports 200+ Hugging Face model architectures with an OpenAI-compatible API.

Topics

Subtopics

INFERENCE OPTIMIZATIONMODEL SERVINGKERNELS

Links

LAST 30 DAYS

LAST 30 DAYS

vLLM vLLM's AFD Plugin Cuts DeepSeek-V3.2 Response Time by 47%

vLLM's AFD Plugin Cuts DeepSeek-V3.2 Response Time by 47%

17 hrs ago

17H AGO

vLLM vLLM's TileRT Integration Hits 618 tok/s by Splitting AI's Two Hardest Jobs

vLLM's TileRT Integration Hits 618 tok/s by Splitting AI's Two Hardest Jobs

Jul 15

Jul 15

vLLM Ant Group Pushes Qwen3-Omni to 5.4x Faster With 0.6s Audio Response

Ant Group Pushes Qwen3-Omni to 5.4x Faster With 0.6s Audio Response

Jul 03

Jul 03

vLLM vLLM-Omni Squeezes 172% More Audio Out of Four Speech Models

vLLM-Omni Squeezes 172% More Audio Out of Four Speech Models

Jun 29

Jun 29

vLLM Baidu's Unlimited-OCR Parses 200-Page PDFs With Flat Memory Usage

Baidu's Unlimited-OCR Parses 200-Page PDFs With Flat Memory Usage

Jun 28

Jun 28

vLLM NVIDIA Shrinks GLM-5.2 Memory by 1.8x With NVFP4 Without Losing Accuracy

NVIDIA Shrinks GLM-5.2 Memory by 1.8x With NVFP4 Without Losing Accuracy

Jun 26

Jun 26

vLLM vLLM v0.23.0 Ships DeepSeek-V4 Production Hardening and 56% Throughput Boost

vLLM v0.23.0 Ships DeepSeek-V4 Production Hardening and 56% Throughput Boost

Jun 15

Jun 15

vLLM vLLM Launches vime to Fix RL Fine-Tuning's Worst Numerical Drift Problem

vLLM Launches vime to Fix RL Fine-Tuning's Worst Numerical Drift Problem

Jun 09

Jun 09

vLLM vLLM-Omni v0.22.0 Ships NVIDIA Cosmos 3 Support and Robot Serving

vLLM-Omni v0.22.0 Ships NVIDIA Cosmos 3 Support and Robot Serving

Jun 08

Jun 08

vLLM vLLM v0.22.0 Ships a Rust Frontend and Cuts Latency by 28.9%

vLLM v0.22.0 Ships a Rust Frontend and Cuts Latency by 28.9%

May 30

May 30