Summary by David Stutz 3 months ago
Bastani et al. propose formal robustness measures and an algorithm for approximating them for piece-wise linear networks. Specifically, the notion of robustness is similar to related work:
$\rho(f,x) = \inf\{\epsilon \geq 0 | f \text{ is not } (x,\epsilon)\text{-robust}$
where $(x,\epsilon)$-robustness demands that for every $x'$ with $\|x'-x\|_\infty$ it holds that $f(x') = f(x)$ – in other words, the label does not change for perturbations $\eta = x'-x$ which are small in terms of the $L_\infty$ norm and the constant $\epsilon$. Clearly, a higher $\epsilon$ implies a stronger notion of robustness. Additionally, the above definition is essentially a pointwise definition of robustness.
In order to measure robustness for the whole network (i.e. not only pointwise), the authors introduce the adversarial frequency:
$\psi(f,\epsilon) = p_{x\sim D}(\rho(f,x) \leq \epsilon)$.
This measure measures how often $f$ failes to be robust in the sense of $(x,\epsilon)$-robustness. The network is more robust when it has low adversarial frequency. Additionally, they introduce adversarial severity:
$\mu(f,\epsilon) = \mathbb{E}_{x\sim D}[\rho(f,x) | \rho(f,x) \leq \epsilon]$
which measures how severly $f$ fails to be robust (if it fails to be robust for a sample $x$).
Both above measures can be approximated by counting given that the robustness $\rho(f, x)$ is known for all samples $x$ in a separate test set. And this is the problem of the proposed measures: in order to approximate $\rho(f, x)$, the authors propose an optimization-based approach assuming that the neural network is piece-wise linear. This assumption is not necessarily unrealistic, dot products, convolutions, $\text{ReLU}$ activations and max pooling are all piece-wise linear. Even batch normalization is piece-wise linear at test time. The problem, however, is that th enetwork needs to be encoded in terms of linear programs, which I believe is cumbersome for real-world networks.
Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

more
less