How Can We Be So Dense? The Benefits of Using Highly Sparse RepresentationsHow Can We Be So Dense? The Benefits of Using Highly Sparse RepresentationsAhmad, Subutai and Scheinkman, Luiz2019

Paper summarydavidstutzAhmad and Scheinkman propose a simple sparse layer in order to improve robustness against random noise. Specifically, considering a general linear network layer, i.e.
$\hat{y}^l = W^l y^{l-1} + b^l$ and $y^l = f(\hat{y}^l$
where $f$ is an activation function, the weights are first initialized using a sparse distribution; then, the activation function (commonly ReLU) is replaced by a top-$k$ ReLU version where only the top-$k$ activations are propagated. In experiments, this is shown to improve robustness against random noise on MNIST.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Ahmad and Scheinkman propose a simple sparse layer in order to improve robustness against random noise. Specifically, considering a general linear network layer, i.e.
$\hat{y}^l = W^l y^{l-1} + b^l$ and $y^l = f(\hat{y}^l$
where $f$ is an activation function, the weights are first initialized using a sparse distribution; then, the activation function (commonly ReLU) is replaced by a top-$k$ ReLU version where only the top-$k$ activations are propagated. In experiments, this is shown to improve robustness against random noise on MNIST.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).