Rule-based reasoning is a fundamental problem, but variations in rule formats and complexity in real-world applications are challenging.
Large reasoning models enhanced by reinforcement learning have shown remarkable capabilities.
The effectiveness of small reasoning models in learning rule-based reasoning with generalization across tasks and domains remains an open question.
A method called RuleReasoner is introduced to conduct rule-based reasoning with a wide range of tasks and domain-aware dynamic sampling.
RuleReasoner resamples training batches by updating sampling weights based on historical rewards to facilitate domain augmentation and flexible learning schedules.
Empirical evaluations show that RuleReasoner outperforms leading large reasoning models on in-distribution and out-of-distribution benchmarks.
RuleReasoner achieves a significant performance improvement over existing methods on both in-distribution and out-of-distribution tasks.
The approach also demonstrates higher computational efficiency compared to previous dynamic sampling methods for reinforcement learning.