<ul><li>Improving open-source models on real-world SWE tasks (solving GITHUB issues) faces challenges in scalable curation of execution environments and optimal test-time compute scaling.</li><li>AgentGym is introduced as the largest procedurally-curated executable gym environment for training real-world SWE-agents, with over 8.7K tasks.</li><li>SYNGEN, a synthetic data curation recipe, is used to enable scalable curation of executable environments, leading to improved training performance.</li><li>Hybrid Test-time Scaling is employed, showcasing the complementary strengths and limitations of execution-based and execution-free verifiers.</li></ul>

R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents

Discover more