Certified Defenses against Adversarial Examples Certified Defenses against Adversarial Examples
Paper summary Raghunathan et al. provide an upper bound on the adversarial loss of two-layer networks and also derive a regularization method to minimize this upper bound. In particular, the authors consider the scoring functions $f^i(x) = V_i^T\sigma(Wx)$ with bounded derivative $\sigma'(z) \in [0,1]$ which holds for Sigmoid and ReLU activation functions. Still, the model is very constrained considering recent, well-performng deep (convolutional) neural networks. The upper bound is then derived by considering $f(A(x))$ where $A(x)$ is the optimal attacker $A(x) = \arg\max_{\tilde{x} \in B_\epsilon(x)} f(\tilde{x})$. For a linear model $f(x) = (W_1 – W_2)^Tx$, an upper bound can be derived as follows: $f(\tilde{x}) = f(x) + (W_1 – W_2)^T(\tilde{x} – x) \leq f(x) + \epsilon\|W_1 – W_2\|_1$. For two-layer networks a bound is derived by considering $f(\tilde{x}) = f(x) + \int_0^1 \nabla f(t\tilde{x} + (1-t)x)^T (\tilde{x} – x) dt \leq f(x) + \max_{\tilde{x}\in B_\epsilon(x)} \epsilon\|\nabla f(\tilde{x})\|_1$. In this case, Raghunathan rewrite the second term, i.e. $\max_{\tilde{x}\in B_\epsilon(x)} \epsilon\|\nabla f(\tilde{x})\|_1$ to derive an upper bound in the form of a semidefinite program, see the paper for details. For $v = V_1 – V_2$, this semidefinite program is based on the matrix $M(v,W) = \left[\begin{array}0 & 0 & 1^T W^R \text{diag}(v)\\0 & 0 & W^T\text{diag}(v)\\ \text{diag}(v)^T W 1 & \text{diag}(v)^T W & 0\end{array}\right]$. By deriving the dual objective, the upper bound can then be minimized by constraining the eigenvalues of $M(v, W)$ (specifically, the largest eigenvalue; note that the dual also involves dual variables – see the paper for details). Overall, the proposed regularize involves minimizing the largest eigenvalue of $M(v, W) – D$ where $D$ is a diagonal matrix based on the dual variables. In practice, this is implemented using SciPy's implementation of the Lanczos algorithm. Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
arxiv.org
arxiv-sanity.com
scholar.google.com
Certified Defenses against Adversarial Examples
Aditi Raghunathan and Jacob Steinhardt and Percy Liang
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.LG

more

Summary by David Stutz 3 months ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and