[link]
Gowal et al. propose interval bound propagation to obtain certified robustness against adversarial examples. In particular, given a neural network consisting of linear layers and monotonic increasing activation functions, a set of allowed perturbations is propagated to obtain upper and lower bounds at each layer. These lead to bounds on the logits of the network; these are used to verify whether the network changes its prediction on the allowed perturbations. Specifically, Gowal et al. consider an $L_\infty$ ball around input examples; the initial bounds are, thus, $\underline{z}_0 = x  \epsilon$ and $\overline{z}_0 = x + \epsilon$. For each layer, the bounds are defined as $\underline{z}_{k,i} = \min_{\underline{z}_{k – 1} \leq z_{k – 1} \leq \overline{z}_{k1}} e_i^T h_k(z_{k – 1})$ and the analogous maximization problem for the upper bound; here, $h$ denotes the applied layer. For Linear layers and monotonic activation functions, this is easy to solve, as shown in the paper. Moreover, computing these bounds is very efficient, only needing roughly two times the computation of one forward pass. During training, a combination of a clean loss and adversarial loss is used: $\kappa l(z_K, y) + (1  \kappa) l(\hat{z}_K, y)$ where $z_K$ are the logits of the input $x$, and $\hat{z}_K$ are the adversarial logits computed as $\hat{Z}_{K,y’} = \begin{cases} \overline{z}_{K,y’} & \text{if } y’ \neq y\\\underline{z}_{K,y} & \text{otherwise}\end{cases}$ Both $\epsilon$ and $\kappa$ are annealed during training. In experiments, it is shown that this method results in quite tight bounds on robustness. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
Your comment:
