Parseval Networks: Improving Robustness to Adversarial ExamplesParseval Networks: Improving Robustness to Adversarial ExamplesCissé, Moustapha and Bojanowski, Piotr and Grave, Edouard and Dauphin, Yann and Usunier, Nicolas2017
Paper summarydavidstutzCisse et al. propose parseval networks, deep neural networks regularized to learn orthonormal weight matrices. Similar to the work by Hein et al. [1], the mean idea is to constrain the Lipschitz constant of the network – which essentially means constraining the Lipschitz constants of each layer independently. For weight matrices, this can be achieved by constraining the matrix-norm. However, this (depending on the norm used) is often intractable during gradient descent training. Therefore, Cisse et al. propose to use a per-layer regularizer of the form:
$R(W) = \|W^TW – I\|$
where $I$ is the identity matrix. During training, this regularizer is supposed to ensure that the learned weigth matrices are orthonormal – an efficient alternative to regular matrix manifold optimization techniques (see the paper).
[1] Matthias Hein, Maksym Andriushchenko: Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation. CoRR abs/1705.08475 (2017)
Also see this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
Parseval Networks: Improving Robustness to Adversarial Examples
Cissé, Moustapha
and
Bojanowski, Piotr
and
Grave, Edouard
and
Dauphin, Yann
and
Usunier, Nicolas
International Conference on Machine Learning - 2017 via Local Bibsonomy
Keywords:
dblp
Cisse et al. propose parseval networks, deep neural networks regularized to learn orthonormal weight matrices. Similar to the work by Hein et al. [1], the mean idea is to constrain the Lipschitz constant of the network – which essentially means constraining the Lipschitz constants of each layer independently. For weight matrices, this can be achieved by constraining the matrix-norm. However, this (depending on the norm used) is often intractable during gradient descent training. Therefore, Cisse et al. propose to use a per-layer regularizer of the form:
$R(W) = \|W^TW – I\|$
where $I$ is the identity matrix. During training, this regularizer is supposed to ensure that the learned weigth matrices are orthonormal – an efficient alternative to regular matrix manifold optimization techniques (see the paper).
[1] Matthias Hein, Maksym Andriushchenko: Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation. CoRR abs/1705.08475 (2017)
Also see this summary at [davidstutz.de](https://davidstutz.de/category/reading/).