#93

LMSYS Org

Open research collective (UC Berkeley, Stanford, CMU) focused on large model systems. Builds SGLang, a high-throughput inference engine for LLMs and VLMs with radix cache and chunked prefill; FastChat, an open platform for training, fine-tuning, and serving LLMs; Vicuna, a Llama-based open-source chat model; and S-LoRA for multi-adapter serving.

Topics

INFRA

LLMS

OPEN_SOURCE

Subtopics

INFERENCE OPTIMIZATIONMODEL SERVINGAGENT FRAMEWORKS

Links