George Hotz's Tinygrad Crashes MLPerf With 25,000 Lines of Python

the tiny corp

George Hotz's Tinygrad Crashes MLPerf With 25,000 Lines of Python

5H AGO

2 min read

TRAINING_INFRA

distributed_training pretraining

GPUS

kernels

5 hrs ago

TRAINING_INFRA

distributed_training pretraining

GPUS

kernels

2 min read

Tinygrad, the open-source deep learning framework built by George Hotz and a six-person team, just made its debut on the official MLPerf Training v6.0 leaderboard. The submission: Llama 3.1 8B pretraining on AMD MI350X GPUs, using a completely custom driver, runtime, kernel library, and training loop. No PyTorch. No ROCm. No CUDA. Just ~25,000 lines of Python.

What MLPerf is and why it matters

The MLPerf Training benchmark suite measures how fast systems can train models to a target quality metric. Think of it as the F1 circuit for AI hardware and software: every major player submits their best setup, results are peer-reviewed, and the numbers are public. The MLPerf Training benchmark suite comprises full system tests that stress models, software, and hardware for a range of ML applications, and the open-source, peer-reviewed suite provides a level playing field that drives innovation, performance, and energy efficiency for the entire industry.

MLCommons announced new results for the MLPerf Training v6.0 benchmark suite today, with two new benchmarks added in this round highlighting rapid and significant changes in the AI ecosystem. NVIDIA delivered a clean sweep in MLPerf Training v6.0, the latest edition of industry-standard AI training benchmarks developed by the MLCommons consortium. But the more interesting story is who else showed up: MLPerf Training v5.0 marked AMD's first-ever training submission , and now tinycorp is on the board too, listed alongside AMD, Google, Azure, CoreWeave, and NVIDIA as one of 24 submitting organizations.

A 25,000-line stack that shouldn't be able to do this

The conventional wisdom in AI infrastructure is that you need millions of lines of battle-hardened software to run frontier training workloads. CUDA alone took NVIDIA nearly two decades to build. NVIDIA's CUDA ecosystem encompasses roughly 5.9 million developers, 18 years of accumulated libraries including cuDNN, cuBLAS, TensorRT, NCCL, and CUTLASS, and first-class integration with every major ML framework. Tinygrad's answer to all of that is a codebase that is, as Hotz put it, "1000x smaller."

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

Full access to in-depth AI research breakdowns
Be the first to know what's trending before it hits mainstream
Daily curated papers, repos, and industry moves

George Hotz's Tinygrad Crashes MLPerf With 25,000 Lines of Python

Takeaways

What MLPerf is and why it matters

A 25,000-line stack that shouldn't be able to do this

Don't miss what's next in AI