Enhancements in knowledge distillation techniques have led to improved capabilities in compressing Large Language Models into deployable Small Language Models.
A new framework called AdvDistill, which is a reward-guided dataset distillation approach, has been proposed to address limitations in traditional distillation methods on reasoning tasks.
AdvDistill utilizes rewards assigned by rule-based verifiers, based on multiple generations of responses from a teacher model, to train student models effectively.
The study shows a significant enhancement in student model performance for mathematical and complex reasoning tasks, highlighting the advantages of incorporating reward mechanisms in dataset distillation.