menu
techminis

A naukri.com initiative

google-web-stories
Home

>

ML News

>

Learning f...
source image

Arxiv

4d

read

279

img
dot

Image Credit: Arxiv

Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation

  • Advances in self-distillation have shown that when knowledge is distilled from a teacher to a student using the same deep learning (DL) architecture, the student performance can surpass the teacher particularly when the network is overparameterized and the teacher is trained with early stopping.
  • This paper proposes to train only one model and generate multiple diverse teacher representations using distillation-time dropout.
  • To overcome noisy representations, a novel stochastic self-distillation (SSD) training strategy is introduced, which uses student-guided knowledge distillation (SGKD) to filter and weight teacher representations.
  • Experimental results show that the proposed SSD method outperforms state-of-the-art methods without increasing the model size, incurs negligible computational complexity, and achieves superior performance on various datasets.

Read Full Article

like

16 Likes

For uninterrupted reading, download the app