CS2-10k is a new open dataset from Reka AI that packages over 600,000 first-person Counter-Strike 2 gameplay videos , totaling more than 10,000 hours of footage , with exact per-frame action annotations. Every frame in every clip is paired with the keyboard keys that were pressed, the mouse deltas that moved the camera, and the player's 3D world position at that instant. It is free to download on Hugging Face today under a CC BY-NC 4.0 license.

The data problem world models keep running into

Training interactive world models requires data that is notoriously hard to find: egocentric video sequences with densely aligned action signals , keyboard inputs, camera motion, and ego state , all synchronized to the visual stream. Real sensor data from robotics or AR headsets is expensive to collect and narrow in scope. Synthetic data is cheap but lacks visual richness. Real-world embodied data is costly to collect, while synthetic data often lacks the visual richness or behavioral diversity needed for generalization.

Counter-Strike 2 turns out to be a surprisingly elegant solution to this. Because matches are recorded as deterministic replays, you can reconstruct clean first-person video at any point in a match, extracting the precise control inputs that drove each visual change. The result is a perfectly labeled dataset at essentially zero annotation cost.

What's actually in the dataset

CS2-10k is built from public professional match demos sourced from HLTV. For each demo, Reka renders clean first-person video at 720p, 48fps using the demo replay tool inside CS2, producing one video per player per round. With 10 players per match, that multiplies out fast.

The annotation schema is dense and precise. Each clip ships with a .parquet file where every frame entry contains:

Alpha Signal

Don't miss what's next in AI

Join 300,000+ engineers and researchers who get the signal, not the noise.

  • Full access to in-depth AI research breakdowns
  • Be the first to know what's trending before it hits mainstream
  • Daily curated papers, repos, and industry moves