How Can We Be So Dense? The Benefits of Using Highly Sparse Representations on ShortScience.org

arxiv.org
scholar.google.com

How Can We Be So Dense? The Benefits of Using Highly Sparse Representations
Ahmad, Subutai and Scheinkman, Luiz
arXiv e-Print archive - 2019 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by David Stutz 4 years ago

Ahmad and Scheinkman propose a simple sparse layer in order to improve robustness against random noise. Specifically, considering a general linear network layer, i.e.

$\hat{y}^l = W^l y^{l-1} + b^l$ and $y^l = f(\hat{y}^l$

where $f$ is an activation function, the weights are first initialized using a sparse distribution; then, the activation function (commonly ReLU) is replaced by a top-$k$ ReLU version where only the top-$k$ activations are propagated. In experiments, this is shown to improve robustness against random noise on MNIST.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

f is missing a ) right?

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private