Distilling the Knowledge in a Neural Network Distilling the Knowledge in a Neural Network
Paper summary #### Problem addressed: Traditional classifiers are trained using hard targets. This not only calls for learning a very complex function (due to spikes) but also ignores the relative similarity between classes, e.g., truck is more probable to be misclassified as a car instead of a cat. Instead the classifier is forced to assign both the car and cat to a single target value. This leads to poor generalization. This paper addresses this problem. #### Summary: In order to address the aforementioned problems, the paper proposes a method to generate soft labels for each sample by first training a cubersome/large/complex classifier like dropout at a high ""temperature"" in so that it generates soft probabilities for every sample which represents its membership to each class. It then trains a vanilla NN initially at a high temperature and then at a low one using the generated soft labels on either the same training data or a transfer data. By doing so the simpler (student) model performs similar to the complex (teacher) model. #### Novelty: technique for generating soft labels for classes for training a much simpler classifier compared to currently used large and complex methods like dropout/conv-nets. #### Drawbacks: I believe a major drawback of this paper is that it entails learning a complex classifier for generating soft labels. Another drawback is that it is incapable of using unlabeled data. #### Datasets: MNIST, JFT (internal google image dataset) #### Additional remarks: #### Resources: https://www.youtube.com/watch?v=7kAlBa7yhDM #### Presenter: Devansh Arpit
arxiv.org
scholar.google.com
Distilling the Knowledge in a Neural Network
Hinton, Geoffrey E. and Vinyals, Oriol and Dean, Jeffrey
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords: dblp


Loading...
Your comment:
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About