#45

DeepSeek

Chinese AI lab building open-weight LLMs under the DeepSeek-V and R1 model families. Uses sparse MoE architectures, pure reinforcement learning for reasoning (no SFT on R1-Zero), and GRPO. V4 Pro runs 1.6T parameters with 49B active. Models release under MIT License with frontier-competitive benchmark performance at significantly lower training cost.

No records found