Large Language Models (LLMs) face challenges due to their high computational demands for deployment in resource-constrained environments.
A study investigated compressing LLMs using Knowledge Distillation (KD) without compromising Question Answering (QA) task performance.
Student models distilled from Pythia and Qwen2.5 families maintained over 90% of their teacher models' performance while reducing parameter counts by up to 57.1% on SQuAD and MLQA benchmarks.
One-shot prompting showed additional performance gains over zero-shot setups, highlighting the potential of KD and minimal prompting for creating efficient QA systems for resource-constrained applications.