[link]
Ilyas et al. propose three queryefficient blackbox adversarial example attacks using distributionbased gradient estimation. In particular, their simplest attacks involves estimating the gradient locally using a search distribution: $ \nabla_x \mathbb{E}_{\pi(\thetax)} [F(\theta)] = \mathbb{E}_{\pi(\thetax)} [F(\theta) \nabla_x \log(\pi(\thetax))]$ where $F(\cdot)$ is a loss function – e.g., using the crossentropy loss which is maximized to obtain an adversarial example. The above equation, using a Gaussian noise search distribution leads to a simple approximator for the gradient: $\nabla \mathbb{E}[F(\theta)] = \frac{1}{\sigma n} \sum_{i = 1}^n \delta_i F(\theta + \sigma \delta_i)$ where $\sigma$ is the search variance and $\delta_i$ are sampled from a unit Gaussian. This scheme can then be applied as part of the projected gradient descent whitebox attacks to obtain adversarial examples. The above attack assumes that the blackbox network provides probability outputs in order to compute the loss $F$. In the remainder of the paper, the authors also generalize this approach to the labelonly case, where the network only provides the top $k$ labels for each input. In experiments, the attacks is shown to be effective while rarely requiring more than $50$k queries on ImageNet. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).
Your comment:
