Alibaba Group has introduced QwenLong-L1, a new framework for large language models (LLMs) to reason over extremely long inputs.Recent advances in large reasoning models (LRMs) through reinforcement learning (RL) have enhanced problem-solving capabilities.Challenges in scaling reasoning to longer contexts prompted the development of QwenLong-L1.QwenLong-L1 addresses the concept of long-context reasoning RL, requiring accurate handling of lengthy inputs.The framework involves structured training phases like Warm-up Supervised Fine-Tuning and Curriculum-Guided Phased RL.QwenLong-L1 uses a hybrid reward system for training, combining rule-based verification and a semantic comparison approach.Evaluation of QwenLong-L1 in document question-answering tasks showed promising performance across benchmarks.Models trained with QwenLong-L1 develop specialized long-context reasoning behaviors like 'grounding' and 'subgoal setting.'QwenLong-L1 applications span legal tech, finance, and customer service, enhancing AI utility in enterprise settings.The researchers have made the QwenLong-L1 code and trained model weights publicly available.