How Can We Be So Dense? The Benefits of Using Highly Sparse Representations How Can We Be So Dense? The Benefits of Using Highly Sparse Representations
Paper summary Ahmad and Scheinkman propose a simple sparse layer in order to improve robustness against random noise. Specifically, considering a general linear network layer, i.e. $\hat{y}^l = W^l y^{l-1} + b^l$ and $y^l = f(\hat{y}^l$ where $f$ is an activation function, the weights are first initialized using a sparse distribution; then, the activation function (commonly ReLU) is replaced by a top-$k$ ReLU version where only the top-$k$ activations are propagated. In experiments, this is shown to improve robustness against random noise on MNIST. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
arxiv.org
scholar.google.com
How Can We Be So Dense? The Benefits of Using Highly Sparse Representations
Ahmad, Subutai and Scheinkman, Luiz
arXiv e-Print archive - 2019 via Local Bibsonomy
Keywords: dblp


[link]
Summary by David Stutz 1 month ago
Loading...
f is missing a ) right?

Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and