Self-Taught Reasoners (STaR), also known as Rejection sampling Fine-Tuning (RFT), is crucial for training self-improving reasoning Language Models (LMs).
Random observation sampling often leads to trained observation imbalance, causing over-training on solved examples and under-training on challenging ones.
AdaSTaR is a new algorithm that addresses this issue by implementing Adaptive Sampling for Diversity and Adaptive Sampling for Curriculum to ensure balanced training and adjust data difficulty based on the model's strength.
Across six benchmarks, AdaSTaR outperforms other methods with best test accuracy in all instances (6/6) and reduces training FLOPs by an average of 58.6%, showing promise for more efficient and effective self-improving LMs.