[link]
Ma et al. detect adversarial examples based on their estimated intrinsic dimensionality. I want to note that this work is also similar to [1] – in both publications, local intrinsic dimensionality is used to analyze adversarial examples. Specifically, the intrinsic dimensionality of a sample is estimated based on the radii $r_i(x)$ of the $k$nearest neighbors around a sample $x$: $ \left(\frac{1}{k} \sum_{i = 1}^k \log \frac{r_i(x)}{r_k(x)}\right)^{1}$. For details regarding the original, theoretical formulation of local intrinsic dimensionality I refer to the paper. In experiments, the authors show that adversarial examples exhibit a significant higher intrinsic dimensionality than training samples or randomly perturbed examples. This observation allows detection of adversarial examples. A proper interpretation of this finding is, however, missing. It would be interesting to investigate what this finding implies about the properties of adversarial examples.
Your comment:
