/


#68
Mechanize
RL environment and eval shop selling to frontier AI labs. Builds simulated digital workplaces, with email, Slack, code editors, and browsers, where agents complete software engineering tasks and receive graded rewards. Its GBA Eval benchmarks coding agents by having them write a Game Boy Advance emulator from scratch in 24 hours.
Categories
Subcategories
LONG CONTEXT
Links
