Summary by David Stutz 6 months ago
Raghunathan et al. provide an upper bound on the adversarial loss of two-layer networks and also derive a regularization method to minimize this upper bound. In particular, the authors consider the scoring functions $f^i(x) = V_i^T\sigma(Wx)$ with bounded derivative $\sigma'(z) \in [0,1]$ which holds for Sigmoid and ReLU activation functions. Still, the model is very constrained considering recent, well-performng deep (convolutional) neural networks. The upper bound is then derived by considering $f(A(x))$ where $A(x)$ is the optimal attacker $A(x) = \arg\max_{\tilde{x} \in B_\epsilon(x)} f(\tilde{x})$. For a linear model $f(x) = (W_1 – W_2)^Tx$, an upper bound can be derived as follows:
$f(\tilde{x}) = f(x) + (W_1 – W_2)^T(\tilde{x} – x) \leq f(x) + \epsilon\|W_1 – W_2\|_1$.
For two-layer networks a bound is derived by considering
$f(\tilde{x}) = f(x) + \int_0^1 \nabla f(t\tilde{x} + (1-t)x)^T (\tilde{x} – x) dt \leq f(x) + \max_{\tilde{x}\in B_\epsilon(x)} \epsilon\|\nabla f(\tilde{x})\|_1$.
In this case, Raghunathan rewrite the second term, i.e. $\max_{\tilde{x}\in B_\epsilon(x)} \epsilon\|\nabla f(\tilde{x})\|_1$ to derive an upper bound in the form of a semidefinite program, see the paper for details. For $v = V_1 – V_2$, this semidefinite program is based on the matrix
$M(v,W) = \left[\begin{array}0 & 0 & 1^T W^R \text{diag}(v)\\0 & 0 & W^T\text{diag}(v)\\ \text{diag}(v)^T W 1 & \text{diag}(v)^T W & 0\end{array}\right]$.
By deriving the dual objective, the upper bound can then be minimized by constraining the eigenvalues of $M(v, W)$ (specifically, the largest eigenvalue; note that the dual also involves dual variables – see the paper for details). Overall, the proposed regularize involves minimizing the largest eigenvalue of $M(v, W) – D$ where $D$ is a diagonal matrix based on the dual variables. In practice, this is implemented using SciPy's implementation of the Lanczos algorithm.
Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

more
less