Automated scoring systems used in large-scale assessment traditionally require a large quantity of hand-scored data for accurate predictions.
Generative Large Language Models can generalize to new tasks with little to no data but still need fine-tuning.
The proposed model distillation pipeline, named 'Cyborg Data', combines human and machine-scored responses in training.
Student models trained on 'Cyborg Data' achieve performance similar to training on the entire dataset, using only 10% of the original hand-scored data.