[link]
Frankle and Carbin discover socalled winning tickets, subset of weights of a neural network that are sufficient to obtain stateoftheart accuracy. The lottery hypothesis states that dense networks contain subnetworks – the winning tickets – that can reach the same accuracy when trained in isolation, from scratch. The key insight is that these subnetworks seem to have received optimal initialization. Then, given a complex trained network for, e.g., Cifar, weights are pruned based on their absolute value – i.e., weights with small absolute value are pruned first. The remaining network is trained from scratch using the original initialization and reaches competitive performance using less than 10% of the original weights. As soon as the subnetwork is reinitialized, these results cannot be reproduced though. This suggests that these subnetworks obtained some sort of “optimal” initialization for learning. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
Your comment:
