ShortScience.org Latest SummariesShortScience.org Latest Summaries
http://www.shortscience.org/
60Sat, 21 Jul 2018 13:01:01 -06001807.01604journals/corr/1807.016042Quasi-Monte Carlo Variational InferenceArtem SobolevVariational Inference builds around the ELBO (Evidence Lower BOund) -- a lower bound on a marginal log-likelihood of the observed data $\log p(x) = \log \int p(x, z) dz$ (which is typically intractable). The ELBO makes use of an approximate posterior to form a lower bound:
$$
\log p(x) \ge \mathbb{E}_{q(z|x)} \log \frac{p(x, z)}{q(z|x)}
$$
# Introduction to Quasi Monte Carlo
It's assumed that both the join $p(x, z)$ (or, equivalently the likelihood $p(x|z)$ and the prior $p(z)$) and the appro...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.01604#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.01604#artemsFri, 20 Jul 2018 11:01:35 -06001709.04326journals/corr/1709.043262Learning with Opponent-Learning AwarenessCodyWildA central question of this paper is: under what circumstances will you see agents that have been trained to optimize their own reward implement strategies - like tit for tat - that are are more sophisticated and higher overall reward than each agent simply pursuing its dominant strategy. The games under consideration here are “general sum” games like Iterated Prisoner’s Dilemma, where each agent’s dominant strategy is to defect, but with some amount of coordination or reciprocity, better...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#decodyngThu, 19 Jul 2018 16:55:33 -06001806.05759journals/corr/1806.057592Insights on representational similarity in neural networks with canonical correlationCodyWildThe overall goal of the paper is measure how similar different layer activation profiles are to one another, in hopes of being able to quantify the similarity of the representations that different layers are learning. If you had a measure that captured this, you could ask questions like: “how similar are the representations that are learned by different networks on the same task”, and “what is the dynamic of representational change in a given layer throughout training”?
Canonical Corre...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.05759#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.05759#decodyngTue, 17 Jul 2018 23:18:12 -06001802.07535journals/corr/1802.075353BRUNO: A Deep Recurrent Model for Exchangeable DataArtem SobolevIf one is a Bayesian he or she best expresses beliefs about next observation $x_{n+1}$ after observing $x_1, \dots, x_n$ using the **posterior predictive distribution**: $p(x_{n+1}\vert x_1, \dots, x_n)$. Typically one invokes the de Finetti theorem and assumes there exists an underlying model $p(x\vert\theta)$, hence $p(x_{n+1}\vert x_1, \dots, x_n) = \int p(x_{n+1} \vert \theta) p(\theta \vert x_1, \dots, x_n) d\theta$, however this integral is far from tractable in most cases. Nevertheless, h...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.07535#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.07535#artemsMon, 09 Jul 2018 17:46:37 -06001712.01238journals/corr/1712.012383Learning by Asking QuestionsOleksandr BailoThis paper is about interactive Visual Question Answering (VQA) setting in which agents must ask questions about images to learn. This closely mimics how people learn from each other using natural language and has a strong potential to learn much faster with fewer data. It is referred as learning by asking (LBA) through the paper. The approach is composed of three models:
1. **Question proposal module** is responsible for generating _important_ questions about the image. It is a combination of...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.01238#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.01238#ukrdailoSun, 08 Jul 2018 12:32:56 -06001711.11543journals/corr/1711.115432Embodied Question AnsweringOleksandr BailoThis paper introduces a new AI task - Embodied Question Answering. The goal of this task for an agent is to be able to answer the question by observing the environment through a single egocentric RGB camera while being able to navigate inside the environment. The agent has 4 natural modules:
1. **Vision**. 224x224 RGB images are processed by CNN to produce a fixed-size representation. This CNN is pretrained on pixel-to-pixel tasks such as RGB reconstruction, semantic segmentation, and depth est...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.11543#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.11543#ukrdailoWed, 04 Jul 2018 02:12:50 -060010.18653/v1/p16-10782Tree-to-Sequence Attentional Neural Machine TranslationTim MillerThis work extends sequence-to-sequence models for machine translation by using syntactic information on the source language side. This paper looks at the translation task where English is the source language, and Japanese is the target language. The dataset is the ASPEC corpus of scientific paper abstracts that seem to be in both English and Japanese? (See note below). The trees for the source (English) are generated by running the ENJU parser on the English data, resulting in binary trees, and ...
http://www.shortscience.org/paper?bibtexKey=10.18653/v1/p16-1078#tmills
http://www.shortscience.org/paper?bibtexKey=10.18653/v1/p16-1078#tmillsTue, 03 Jul 2018 15:43:38 -06001804.08328journals/corr/1804.083283Taskonomy: Disentangling Task Transfer LearningOleksandr BailoThe goal of this work is to perform transfer learning among numerous tasks and to discover visual relationships among them. Specifically, while we intiutively might guess the depth of an image and surface normals are related, this work takes a step forward and discovers a beneficial relationship among 26 tasks in terms of task transferability - many of them are not obvious. This is important for scenarios when an insufficient budget is available for target task for annotation, thus, learned repr...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.08328#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.08328#ukrdailoMon, 02 Jul 2018 02:46:39 -06001702.02284journals/corr/1702.022842Adversarial Attacks on Neural Network PoliciesDavid StutzHuang et al. study adversarial attacks on reinforcement learning policies. One of the main problems, in contrast to supervised learning, is that there might not be a reward in any time step, meaning there is no clear objective to use. However, this is essential when crafting adversarial examples as they are mostly based on maximizing the training loss. To avoid this problem, Huang et al. assume a well-trained policy; the policy is expected to output a distribution over actions. Then, adversarial...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.02284#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.02284#davidstutzThu, 28 Jun 2018 19:16:01 -06001712.03141journals/corr/1712.031412Wild Patterns: Ten Years After the Rise of Adversarial Machine LearningDavid StutzBiggio and Roli provide a comprehensive survey and discussion of work in adversarial machine learning. In contrast to related work [1,2], they explicitly discuss the relation of recent developments regarding the security of deep neural networks (as primarily discussed in [1] and [2]) and adversarial machine learning in general. The latter can be traced back to early work starting in 2004, e.g. involving adversarial attacks on spam filters. As a result, terminology used by Biggio and Roli is slig...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.03141#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.03141#davidstutzThu, 28 Jun 2018 19:11:16 -06001801.00553journals/corr/1801.005532Threat of Adversarial Attacks on Deep Learning in Computer Vision: A SurveyDavid StutzAkhtar and Mian present a comprehensive survey of attacks and defenses of deep neural networks, specifically in computer vision. Published on ArXiv in January 2018, but probably written prior to August 2017, the survey includes recent attacks and defenses. For example, Table 1 presents an overview of attacks on deep neural networks – categorized by knowledge, target and perturbation measure. The authors also provide a strength measure – in the form of a 1-5 start “rating”. Personally, ho...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00553#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00553#davidstutzThu, 28 Jun 2018 19:06:48 -06001712.07107journals/corr/1712.071072Adversarial Examples: Attacks and Defenses for Deep LearningDavid StutzYuan et al. present a comprehensive survey of attacks, defenses and studies regarding the robustness and security of deep neural networks. Published on ArXiv in December 2017, it includes most recent attacks and defenses. For examples, Table 1 lists all known attacks – Yuan et al. categorize the attacks according to the level of knowledge needed, targeted or non-targeted, the optimization needed (e.g. iterative) as well as the perturbation measure employed. As a result, Table 1 gives a solid o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.07107#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.07107#davidstutzThu, 28 Jun 2018 18:59:29 -06001605.01775journals/corr/1605.017752Adversarial Diversity and Hard Positive GenerationDavid StutzRozsa et al. propose PASS, an perceptual similarity metric invariant to homographies to quantify adversarial perturbations. In particular, PASS is based on the structural similarity metric SSIM [1]; specifically
$PASS(\tilde{x}, x) = SSIM(\psi(\tilde{x},x), x)$
where $\psi(\tilde{x}, x)$ transforms the perturbed image $\tilde{x}$ to the image $x$ by applying a homography $H$ (which can be found through optimization). Based on this similarity metric, they consider additional attacks which creat...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.01775#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.01775#davidstutzThu, 28 Jun 2018 18:32:44 -06001605.07262journals/corr/1605.072622Measuring Neural Net Robustness with ConstraintsDavid StutzBastani et al. propose formal robustness measures and an algorithm for approximating them for piece-wise linear networks. Specifically, the notion of robustness is similar to related work:
$\rho(f,x) = \inf\{\epsilon \geq 0 | f \text{ is not } (x,\epsilon)\text{-robust}$
where $(x,\epsilon)$-robustness demands that for every $x'$ with $\|x'-x\|_\infty$ it holds that $f(x') = f(x)$ – in other words, the label does not change for perturbations $\eta = x'-x$ which are small in terms of the $L_\...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.07262#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.07262#davidstutzThu, 28 Jun 2018 18:23:07 -06001711.10925journals/corr/1711.109253Deep Image PriorDavid StutzUlyanov et al. utilize untrained neural networks as regularizer/prior for various image restoration tasks such as denoising, inpainting and super-resolution. In particualr, the standard formulation of such tasks, i.e.
$x^\ast = \arg\min_x E(x, x_0) + R(x)$
where $x_0$ is the input image and $E$ a task-dependent data term, is rephrased as follows:
$\theta^\ast = \arg\min_\theta E(f_\theta(z); x_0)$ and $x^\ast = f_{\theta^\ast}(z)$
for a fixed but random $z$. Here, the regularizer $R$ is esse...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10925#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10925#davidstutzThu, 28 Jun 2018 18:14:51 -06001801.02774journals/corr/1801.027742Adversarial SpheresDavid StutzGilmer et al. study the existence of adversarial examples on a synthetic toy datasets consisting of two concentric spheres. The dataset is created by randomly sampling examples from two concentric spheres, one with radius $1$ and one with radius $R = 1.3$. While the authors argue that difference difficulties of the dataset can be created by varying $R$ and the dimensionality, they merely experiment with $R = 1.3$ and a dimensionality of $500$. The motivation to study this dataset comes form the ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02774#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02774#davidstutzThu, 28 Jun 2018 18:02:30 -06001608.08967journals/corr/1608.089672Robustness of classifiers: from adversarial to random noiseDavid StutzFawzi et al. study robustness in the transition from random samples to semi-random and adversarial samples. Specifically they present bounds relating the norm of an adversarial perturbation to the norm of random perturbations – for the exact form I refer to the paper. Personally, I find the definition of semi-random noise most interesting, as it allows to get an intuition for distinguishing random noise from adversarial examples. As in related literature, adversarial examples are defined as
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.08967#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.08967#davidstutzThu, 28 Jun 2018 17:54:18 -06001608.07690journals/corr/1608.076902A Boundary Tilting Persepective on the Phenomenon of Adversarial ExamplesDavid StutzTanay and Griffin introduce the boundary tilting perspective as alternative to the “linear explanation” for adversarial examples. Specifically, they argue that it is not reasonable to assume that the linearity in deep neural networks causes the existence of adversarial examples. Originally, Goodfellow et al. [1] explained the impact of adversarial examples by considering a linear classifier:
$w^T x' = w^Tx + w^T\eta$
where $\eta$ is the adversarial perturbations. In large dimensions, the s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.07690#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.07690#davidstutzThu, 28 Jun 2018 17:50:32 -06001801.09344journals/corr/1801.093442Certified Defenses against Adversarial ExamplesDavid StutzRaghunathan et al. provide an upper bound on the adversarial loss of two-layer networks and also derive a regularization method to minimize this upper bound. In particular, the authors consider the scoring functions $f^i(x) = V_i^T\sigma(Wx)$ with bounded derivative $\sigma'(z) \in [0,1]$ which holds for Sigmoid and ReLU activation functions. Still, the model is very constrained considering recent, well-performng deep (convolutional) neural networks. The upper bound is then derived by considerin...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.09344#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.09344#davidstutzThu, 28 Jun 2018 17:41:17 -0600conf/icml/CisseBGDU172Parseval Networks: Improving Robustness to Adversarial ExamplesDavid StutzCisse et al. propose parseval networks, deep neural networks regularized to learn orthonormal weight matrices. Similar to the work by Hein et al. [1], the mean idea is to constrain the Lipschitz constant of the network – which essentially means constraining the Lipschitz constants of each layer independently. For weight matrices, this can be achieved by constraining the matrix-norm. However, this (depending on the norm used) is often intractable during gradient descent training. Therefore, Cis...
http://www.shortscience.org/paper?bibtexKey=conf/icml/CisseBGDU17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/icml/CisseBGDU17#davidstutzThu, 28 Jun 2018 17:33:43 -06001801.02613journals/corr/1801.026132Characterizing Adversarial Subspaces Using Local Intrinsic DimensionalityDavid StutzMa et al. detect adversarial examples based on their estimated intrinsic dimensionality. I want to note that this work is also similar to [1] – in both publications, local intrinsic dimensionality is used to analyze adversarial examples. Specifically, the intrinsic dimensionality of a sample is estimated based on the radii $r_i(x)$ of the $k$-nearest neighbors around a sample $x$:
$- \left(\frac{1}{k} \sum_{i = 1}^k \log \frac{r_i(x)}{r_k(x)}\right)^{-1}$.
For details regarding the original,...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02613#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02613#davidstutzWed, 27 Jun 2018 21:38:25 -06001703.00410journals/corr/1703.004102Detecting Adversarial Samples from ArtifactsDavid StutzFeinman et al. use dropout to compute an uncertainty measure that helps to identify adversarial examples. Their so-called Bayesian Neural Network Uncertainty is computed as follows:
$\frac{1}{T} \sum_{i=1}^T \hat{y}_i^T \hat{y}_i - \left(\sum_{i=1}^T \hat{y}_i\right)\left(\sum_{i=1}^T \hat{y}_i\right)$
where $\{\hat{y}_1,\ldots,\hat{y}_T\}$ is a set of stochastic predictions (i.e. predictions with different noise patterns in the dropout layers). Here, is can easily be seen that this measure co...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.00410#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.00410#davidstutzWed, 27 Jun 2018 21:29:35 -06001705.07263journals/corr/1705.072632Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection MethodsDavid StutzCarlini and Wagner study the effectiveness of adversarial example detectors as defense strategy and show that most of them can by bypassed easily by known attacks. Specifically, they consider a set of adversarial example detection schemes, including neural networks as detectors and statistical tests. After extensive experiments, the authors provide a set of lessons which include:
- Randomization is by far the most effective defense (e.g. dropout).
- Defenses seem to be dataset-specific. There is...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07263#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07263#davidstutzWed, 27 Jun 2018 21:22:18 -06001702.06280journals/corr/1702.062802On the (Statistical) Detection of Adversarial ExamplesDavid StutzGrosse et al. use statistical tests to detect adversarial examples; additionally, machine learning algorithms are adapted to detect adversarial examples on-the-fly of performing classification. The idea of using statistics tests to detect adversarial examples is simple: assuming that there is a true data distribution, a machine learning algorithm can only approximate this distribution – i.e. each algorithm “learns” an approximate distribution. The ideal adversary uses this discrepancy to d...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.06280#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.06280#davidstutzWed, 27 Jun 2018 21:08:28 -06001711.09404journals/corr/1711.094042Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input GradientsDavid StutzRoss and Doshi-Velez propose input gradient regularization to improve robustness and interpretability of neural networks. As the discussion of interpretability is quite limited in the paper, the main contribution is an extensive evaluation of input gradient regularization against adversarial examples – in comparison to defenses such as distillation or adversarial training. Specifically, input regularization as proposed in [1] is used:
$\arg\min_\theta H(y,\hat{y}) + \lambda \|\nabla_x H(y,\ha...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09404#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09404#davidstutzWed, 27 Jun 2018 20:04:56 -0600conf/nips/HeinA172Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation.David StutzHein and Andriushchenko give a intuitive bound on the robustness of neural networks based on the local Lipschitz constant. With robustness, the authors refer a small $\epsilon$-ball around each sample; this ball is supposed to describe the region where the neural network predicts a constant class. This means that adversarial examples have to compute changes large enough to leave these robust areas. Larger $\epsilon$-balls imply higher robustness to adversarial examples.
When considering a singl...
http://www.shortscience.org/paper?bibtexKey=conf/nips/HeinA17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/HeinA17#davidstutzWed, 27 Jun 2018 19:57:22 -06001802.01421journals/corr/1802.014212Adversarial Vulnerability of Neural Networks Increases With Input DimensionDavid StutzSimon-Gabriel et al. Study the robustness of neural networks with respect to the input dimensionality. Their main hypothesis is that the vulnerability of neural networks against adversarial perturbations increases with the input dimensionality. To support this hypothesis, they provide a theoretical analysis as well as experiments.
The general idea of robustness is that small perturbations $\delta$ of the input $x$ do only result in small variations $\delta \mathcal{L}$ of the loss:
$\delta \ma...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01421#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01421#davidstutzWed, 27 Jun 2018 19:41:53 -06001703.09202journals/corr/1703.092022Biologically inspired protection of deep networks from adversarial attacksDavid StutzNayebi and Ganguli propose saturating neural networks as defense against adversarial examples. The main observation driving this paper can be stated as follows: Neural networks are essentially based on linear sums of neurons (e.g. fully connected layers, convolutiona layers) which are then activated; by injecting a small amount of noise per neuron it is possible to shift the final sum by large values, thereby propagating the noisy through the network and fooling the network into misclassifying a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.09202#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.09202#davidstutzWed, 27 Jun 2018 19:25:51 -06001704.01155journals/corr/1704.011552Feature Squeezing: Detecting Adversarial Examples in Deep Neural NetworksDavid StutzXu et al. propose feature squeezing for detecting and defending against adversarial examples. In particular, they consider “squeezing” the bit depth of the input images as well as local and non-local smoothing (Gaussian, median filtering etc.). In experiments they show that feature squeezing preserves accuracy while defending against adversarial examples. Figure 1 additionally shows an illustration of how feature squeezing can be used to detect adversarial examples.
Figure 1: Illustration ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01155#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01155#davidstutzWed, 27 Jun 2018 19:17:53 -060010.1101/2625012Generative adversarial networks uncover epidermal regulators and predict single cell perturbationsDavid StutzLee et al. propose a variant of adversarial training where a generator is trained simultaneously to generated adversarial perturbations. This approach follows the idea that it is possible to “learn” how to generate adversarial perturbations (as in [1]). In this case, the authors use the gradient of the classifier with respect to the input as hint for the generator. Both generator and classifier are then trained in an adversarial setting (analogously to generative adversarial networks), see t...
http://www.shortscience.org/paper?bibtexKey=10.1101/262501#davidstutz
http://www.shortscience.org/paper?bibtexKey=10.1101/262501#davidstutzWed, 27 Jun 2018 19:08:46 -06001710.10571journals/corr/1710.105712Certifying Some Distributional Robustness with Principled Adversarial TrainingDavid StutzSinha et al. introduce a variant of adversarial training based on distributional robust optimization. I strongly recommend reading the paper for understanding the introduced theoretical framework. The authors also provide guarantees on the obtained adversarial loss – and show experimentally that this guarantee is a realistic indicator. The adversarial training variant itself follows the general strategy of training on adversarially perturbed training samples in a min-max framework. In each ite...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10571#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10571#davidstutzWed, 27 Jun 2018 19:00:07 -06001511.05432journals/corr/1511.054322Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust OptimizationDavid StutzShaham et al. provide an interpretation of adversarial training in the context of robust optimization. In particular, adversarial training is posed as min-max problem (similar to other related work, as I found):
$\min_\theta \sum_i \max_{r \in U_i} J(\theta, x_i + r, y_i)$
where $U_i$ is called the uncertainty set corresponding to sample $x_i$ – in the context of adversarial examples, this might be an $\epsilon$-ball around the sample quantifying the maximum perturbation allowed; $(x_i, y_i)...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.05432#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.05432#davidstutzWed, 27 Jun 2018 18:53:50 -06001511.03034journals/corr/1511.030342Learning with a Strong AdversaryDavid StutzHuang et al. propose a variant of adversarial training called “learning with a strong adversary”. In spirit the idea is also similar to related work [1]. In particular, the authors consider the min-max objective
$\min_g \sum_i \max_{\|r^{(i)}\|\leq c} l(g(x_i + r^{(i)}), y_i)$
where $g$ ranges over expressible functions and $(x_i, y_i)$ is a training sample. In the remainder of the paper, Huang et al. Address the problem of efficiently computing $r^{(i)}$ – i.e. a strong adversarial exam...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.03034#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.03034#davidstutzWed, 27 Jun 2018 18:47:59 -06001507.00677journals/corr/1507.006772Distributional Smoothing with Virtual Adversarial TrainingDavid StutzMiyato et al. propose distributional smoothing (or virtual adversarial training) as defense against adversarial examples. However, I think that both terms do not give a good intuition of what is actually done. Essentially, a regularization term is introduced. Letting $p(y|x,\theta)$ be the learned model, the regularizer is expressed as
$\text{KL}(p(y|x,\theta)|p(y|x+r,\theta)$
where $r$ is the perturbation that maximizes the Kullback-Leibler divergence above, i.e.
$r = \arg\max_r \{\text{KL}(...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1507.00677#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1507.00677#davidstutzWed, 27 Jun 2018 18:43:14 -06001707.06728journals/corr/1707.067282Efficient Defenses Against Adversarial AttacksDavid StutzZantedschi et al. propose Gaussian data augmentation in conjunction with bounded $\text{ReLU}$ activations as defense strategy against adversarial examples. Here, Gaussian data augmentation refers to the practice of adding Gaussian noise to the input during training.
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.06728#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.06728#davidstutzWed, 27 Jun 2018 18:29:49 -06001602.02389journals/corr/1602.023892Ensemble Robustness of Deep Learning AlgorithmsDavid StutzZahavy et al. introduce the concept of ensemble robustness and show that it can be used as indicator for generalization performance. In particular, the main idea is to lift he concept of robustness against adversarial examples to ensemble of networks – as trained, e.g. through Dropout or Bayes-by-Backprop. Letting $Z$ denote the sample set, a learning algorithm is $(K, \epsilon)$ robust if $Z$ can be divided into $K$ disjoint sets $C_1,\ldots,C_K$ such that for every training set $s_1,\ldots,s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.02389#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.02389#davidstutzWed, 27 Jun 2018 18:18:37 -06001712.00673journals/corr/1712.006732Towards Robust Neural Networks via Random Self-ensembleDavid StutzLiu et al. propose randomizing neural networks, implicitly learning an ensemble of models, to defend against adversarial attacks. In particular, they introduce Gaussian noise layers before regular convolutional layers. The noise can be seen as additional parameter of the model. During training, noise is randomly added. During testing, the model is evaluated on a single testing input using multiple random noise vectors; this essentially corresponds to an ensemble of different models (parameterize...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.00673#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.00673#davidstutzWed, 27 Jun 2018 18:07:27 -06001711.01768journals/corr/1711.017682Towards Reverse-Engineering Black-Box Neural NetworksDavid StutzOh et al. propose two different approaches for whitening black box neural networks, i.e. predicting details of their internals such as architecture or training procedure. In particular, they consider attributes regarding architecture (activation function, dropout, max pooling, kernel size of convolutional layers, number of convolutionaly/fully connected layers etc.), attributes concerning optimization (batch size and optimization algorithm) and attributes regarding the data (data split and size)...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.01768#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.01768#davidstutzWed, 27 Jun 2018 17:59:09 -06001704.01547journals/corr/1704.015473Comment on "Biologically inspired protection of deep networks from adversarial attacks"David StutzBrendel et al. propose a decision-based black-box attacks against (deep convolutional) neural networks. Specifically, the so-called Boundary Attack starts with a random adversarial example (i.e. random noise that is not classified as the image to be attacked) and randomly perturbs this initialization to move closer to the target image while remaining misclassified. In pseudo code, the algorithm is described in Algorithm 1. Key component is the proposal distribution $P$ used to guide the adversar...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01547#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01547#davidstutzTue, 26 Jun 2018 21:40:59 -06001708.03999journals/corr/1708.039993ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute ModelsDavid StutzChen et al. propose a gradient-based black-box attack to compute adversarial examples. Specifically, they follow the general idea of [1] where the following objective is optimized:
$\min_x \|x – x_0\|_2 + c \max\{\max_{i\neq t}\{z_i\} – z_t, - \kappa\}$.
Here, $x$ is the adversarial example based on training sample $x_0$. The second part expresses that $x$ is supposed to be misclassified, i.e. the logit $z_i$ for some $i \neq t$ distinct form the true label $t$ is supposed to be larger tha...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.03999#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.03999#davidstutzTue, 26 Jun 2018 21:25:44 -06001708.01697journals/corr/1708.016973Adversarial Robustness: Softmax versus OpenmaxDavid StutzRozsa et al. describe an adersarial attack against OpenMax [1] by directly targeting the logits. Specifically, they assume a network using OpenMax instead of a SoftMax layer to compute the final class probabilities. OpenMax allows “open-set” networks by also allowing to reject input samples. By directly targeting the logits of the trained network, i.e. iteratively pushing the logits in a target direction, it does not matter whether SoftMax or OpenMax layers are used on top, the network can b...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.01697#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.01697#davidstutzTue, 26 Jun 2018 21:19:55 -06001511.07528journals/corr/1511.075283The Limitations of Deep Learning in Adversarial SettingsDavid StutzPapernot et al. Introduce a novel attack on deep networks based on so-called adversarial saliency maps that are computed independently of a loss. Specifically, they consider – for a given network $F(X)$ – the forward derivative
$\nabla F = \frac{\partial F}{\partial X} = \left[\frac{\partial F_j(X)}{\partial x_i}\right]_{i,j}$.
Essentially, this is the regular derivative of $F$ with respect to its input; Papernot et al. seem to refer to is as “forward” derivative as it stands in contra...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.07528#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.07528#davidstutzTue, 26 Jun 2018 21:14:29 -06001712.02779journals/corr/1712.027793A Rotation and a Translation Suffice: Fooling CNNs with Simple TransformationsDavid StutzEngstrom et al. demonstrate that spatial transformations such as translations and rotations can be used to generate adversarial examples. Personally, however, I think that the paper does not address the question where adversarial perturbations “end” and generalization issues “start”. For larger translations and rotations, the problem is clearly a problem of generalization. Small ones could also be interpreted as adversarial perturbations – especially when they are computed under the in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.02779#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.02779#davidstutzTue, 26 Jun 2018 21:05:51 -06001607.02533journals/corr/1607.025333Adversarial examples in the physical worldDavid StutzKurakin et al. demonstrate that adversarial examples are also a concern in the physical world. Specifically, adversarial examples are crafted digitally and then printed to see if the classification network, running on a smartphone still misclassifies the examples. In many cases, adversarial examples are still able to fool the network, even after printing.
Figure 1: Illustration of the experimental setup.
Also find this summary at [davidstutz.de]().
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.02533#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.02533#davidstutzTue, 26 Jun 2018 21:01:38 -06001802.05365journals/corr/1802.053652Deep contextualized word representationsmnoukhovThis paper introduces a deep universal word embedding based on using a bidirectional LM (in this case, biLSTM). First words are embedded with a CNN-based, character-level, context-free, token embedding into $x_k^{LM}$ and then each sentence is parsed using a biLSTM, maximizing the log-likelihood of a word given it's forward and backward context (much like a normal language model).
The innovation is in taking the output of each layer of the LSTM ($h_{k,j}^{LM}$ being the output at layer $j$)
$...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05365#mnoukhov
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05365#mnoukhovTue, 26 Jun 2018 20:56:47 -06001707.03501journals/corr/1707.035013NO Need to Worry about Adversarial Examples in Object Detection in Autonomous VehiclesDavid StutzLu et al. present experiments regarding adversarial examples in the real world, i.e. after printing them. Personally, I find it interesting that researchers are studying how networks can be fooled by physically perturbing images. For me, one of the main conclusions it that it is very hard to evaluate the robustness of networks against physical perturbations. Often it is unclear whether changed lighting conditions, distances or viewpoints to objects might cause the network to fail – which means...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.03501#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.03501#davidstutzTue, 26 Jun 2018 20:56:04 -06001611.01236journals/corr/1611.012363Adversarial Machine Learning at ScaleDavid StutzKurakin et al. present some larger scale experiments using adversarial training on ImageNet to increase robustness. In particular, they claim to be the first using adversarial training on ImageNet. Furthermore, they provide experiments underlining the following conclusions:
- Adversarial training can also be seen as regularizer. This, however, is not surprising as training on noisy training samples is also known to act as regularization.
- Label leaking describes the observation that an adversar...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.01236#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.01236#davidstutzTue, 26 Jun 2018 20:53:02 -06001611.02770journals/corr/1611.027703Delving into Transferable Adversarial Examples and Black-box AttacksDavid StutzLiu et al. provide a comprehensive study on the transferability of adversarial examples considering different attacks and models on ImageNet. In their experiments, they consider both targeted and non-targeted attack and also provide a real-world example by attacking clarifai.com. Here, I want to list some interesting conclusions drawn from their experiments:
- Non-targeted attacks easily transfer between models; targeted-attacks, in contrast, do generally not transfer – meaning that the target...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02770#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02770#davidstutzTue, 26 Jun 2018 20:45:23 -06001610.08401journals/corr/1610.084013Universal adversarial perturbationsDavid StutzMoosavi-Dezfooli et al. propose universal adversarial perturbations – perturbations that are image-agnostic. Specifically, they extend the framework for crafting adversarial examples, i.e. by iteratively solving
$\arg\min_r \|r \|_2$ s.t. $f(x + r) \neq f(x)$.
Here, $r$ denotes the adversarial perturbation, $x$ a training sample and $f$ the neural network. Instead of solving this problem for a specific $x$, the authors propose to solve the problem over the full training set, i.e. in each ite...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1610.08401#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1610.08401#davidstutzTue, 26 Jun 2018 20:39:55 -06001608.04644journals/corr/1608.046443Towards Evaluating the Robustness of Neural NetworksDavid StutzCarlini and Wagner propose three novel methods/attacks for adversarial examples and show that defensive distillation is not effective. In particular, they devise attacks for all three commonly used norms $L_1$, $L_2$ and $L_\infty$ – which are used to measure the deviation of the adversarial perturbation from the original testing sample. In the course of the paper, starting with the targeted objective
$\min_\delta d(x, x + \delta)$ s.t. $f(x + \delta) = t$ and $x+\delta \in [0,1]^n$,
they cons...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.04644#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.04644#davidstutzTue, 26 Jun 2018 20:24:14 -06001706.06083journals/corr/1706.060833Towards Deep Learning Models Resistant to Adversarial AttacksDavid StutzMadry et al. provide an interpretation of training on adversarial examples as sattle-point (i.e. min-max) problem. Based on this formulation, they conduct several experiments on MNIST and CIFAR-10 supporting the following conclusions:
- Projected gradient descent might be “strongest” adversary using first-order information. Here, gradient descent is used to maximize the loss of the classifier directly while always projecting onto the set of “allowed” perturbations (e.g. within an $\epsil...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.06083#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.06083#davidstutzTue, 26 Jun 2018 20:08:20 -06001412.6572journals/corr/1412.65723Explaining and Harnessing Adversarial ExamplesDavid StutzGoodfellow et al. introduce the fast gradient sign method (FGSM) to craft adversarial examples and further provide a possible interpretation of adversarial examples considering linear models. FGSM is a grdient-based, one step method for generating adversarial examples. In particular, letting $J$ be the objective optimized during training and $\epsilon$ be the maximum $\infty$-norm of the adversarial perturbation, FGSM computes
$x' = x + \eta = x + \epsilon \text{sign}(\nabla_x J(x, y))$
where $y...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6572#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6572#davidstutzTue, 26 Jun 2018 20:02:41 -06001705.07204journals/corr/1705.072043Ensemble Adversarial Training: Attacks and DefensesDavid StutzTramèr et al. introduce both a novel adversarial attack as well as a defense mechanism against black-box attacks termed ensemble adversarial training. I first want to highlight that – in addition to the proposed methods – the paper gives a very good discussion of state-of-the-art attacks as well as defenses and how to put them into context. Tramèr et al. consider black-box attacks, focussing on transferrable adversarial examples. Their main observation is as follows: one-shot attacks (i.e....
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07204#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07204#davidstutzTue, 26 Jun 2018 19:56:11 -06001511.04508journals/corr/1511.045082Distillation as a Defense to Adversarial Perturbations against Deep Neural NetworksDavid StutzPapernot et al. build upon the idea of network distillation [1] and propose a simple mechanism to defend networks against adversarial attacks. The main idea of distillation – originally introduced to “distill” the knowledge of very deep networks into smaller ones – is to train a second, possibly smaller network, with the probability distributions of the original, possibly larger network as supervision. Papernot et al. as well as the authors of [1] argue that the probability distributions...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.04508#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.04508#davidstutzTue, 26 Jun 2018 18:29:02 -06001604.03540journals/corr/1604.035402Training Region-based Object Detectors with Online Hard Example MiningRyanDsouzaThe problem statement this paper tries to address is that the training set is distinguished by a large imbalance between the number of foreground examples and background examples-To make the point concrete cases like sliding window object detectors like deformable parts model, the imbalance may be as extreme as 100,000 background examples to one annotated foreground example.
Before i proceed to give you the details of Hard Example mining, i just want to note that HEM in its essence is mostly w...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1604.03540#ryandsouza
http://www.shortscience.org/paper?bibtexKey=journals/corr/1604.03540#ryandsouzaTue, 26 Jun 2018 14:30:07 -06001703.05175journals/corr/1703.051752Prototypical Networks for Few-shot LearningCodyWildThis paper describes an architecture designed for generating class predictions based on a set of features in situations where you may only have a few examples per class, or, even where you see entirely new classes at test time. Some prior work has approached this problem in ridiculously complex fashion, up to and including training a network to predict the gradient outputs of a meta-network that it thinks would best optimize loss, given a new class. The method of Prototypical Networks prides its...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.05175#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.05175#decodyngTue, 26 Jun 2018 05:00:56 -06001710.04087journals/corr/1710.040872Word Translation Without Parallel DataCodyWildThe core goal of this paper is to perform in an unsupervised (read: without parallel texts) way what other machine translation researchers had previously only effectively performed in a supervised way: the creation of a word-to-word translational mapping between natural languages. To frame the problem concretely: the researchers start with word embeddings learned in each language independently, and their desired output is a set of nearest neighbors for a source word that contains the true target...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.04087#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.04087#decodyngTue, 26 Jun 2018 04:58:44 -06001805.04770journals/corr/1805.047702Born Again Neural NetworksCodyWildA finding first publicized by Geoff Hinton is the fact that, when you train a simple, lower capacity module on the probability outputs of another model, you can often get a model that has comparable performance, despite that lowered capacity. Another, even more interesting finding is that, if you take a trained model, and train a model with identical structure on its probability outputs, you can often get a model with better performance than the original teacher, with quicker convergence.
This ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.04770#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.04770#decodyngTue, 26 Jun 2018 04:46:03 -06001802.05751journals/corr/1802.057513Image TransformerCodyWildLast year, a machine translation paper came out, with an unfortunately un-memorable name (the Transformer network) and a dramatic proposal for sequence modeling that eschewed both Recurrent NNN and Convolutional NN structures, and, instead, used self-attention as its mechanism for “remembering” or aggregating information from across an input. Earlier this month, the same authors released an extension of that earlier paper, called Image Transformer, that applies the same attention-only approa...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05751#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05751#decodyngTue, 26 Jun 2018 04:45:23 -06001711.02827journals/corr/1711.028272Inverse Reward DesignCodyWild
It’s a commonly understood problem in Reinforcement Learning: that it is difficult to fully specify your exact reward function for an agent you’re training, especially when that agent will need to operate in conditions potentially different than those it was trained in. The canonical example of this, used throughout the Inverse Rewards Design paper, is that of an agent trained on an environment of grass and dirt, that now encounters an environment with lava. In a typical problem setup, the ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.02827#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.02827#decodyngTue, 26 Jun 2018 04:44:50 -06001704.06960journals/corr/1704.069602Translating NeuraleseCodyWildThis paper has an unusual and interesting goal, compared to those I more typically read: it wants to develop a “translation” between the messages produced by a model, and natural language used by a human. More specifically, the paper seeks to do this in the context of an two-player game, where one player needs to communicate information to the other. A few examples of this are:
- Being shown a color, and needing to communicate to your partner so they can choose that color
- Driving, in an ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.06960#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.06960#decodyngTue, 26 Jun 2018 04:44:18 -06001805.11604journals/corr/1805.116044How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift)CodyWildAt NIPS 2017, Ali Rahimi was invited on stage to give a keynote after a paper he was on received the “Test of Time” award. While there, in front of several thousand researchers, he gave an impassioned argument for more rigor: more small problems to validate our assumptions, more visibility into why our optimization algorithms work the way they do. The now-famous catchphrase of the talk was “alchemy”; he argued that the machine learning community has been effective at finding things that ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.11604#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.11604#decodyngTue, 26 Jun 2018 04:42:50 -06001803.08494journals/corr/1803.084942Group NormalizationCodyWildIf you were to survey researchers, and ask them to name the 5 most broadly influential ideas in Machine Learning from the last 5 years, I’d bet good money that Batch Normalization would be somewhere on everyone’s lists. Before Batch Norm, training meaningfully deep neural networks was an unstable process, and one that often took a long time to converge to success. When we added Batch Norm to models, it allowed us to increase our learning rates substantially (leading to quicker training) with...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.08494#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.08494#decodyngTue, 26 Jun 2018 04:42:07 -06001804.04849journals/corr/1804.048492The unreasonable effectiveness of the forget gateCodyWildI have a lot of fondness for this paper as a result of its impulse towards clear explanations, simplicity, and pushing back against complexity for complexity’s sake. The goal of the paper is pretty straightforward. Long Short Term Memory networks (LSTM) work by having a memory vector, and pulling information into and out of that vector through a gating system. These gates take as input the context of the network at a given timestep (the prior hidden state, and the current input), apply weight ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.04849#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.04849#decodyngTue, 26 Jun 2018 04:40:30 -06001802.04821journals/corr/1802.048213Evolved Policy GradientsCodyWildThe general goal of meta-learning systems is to learn useful shared structure across a broad distribution of tasks, in such a way that learning on a new task can be faster. Some of the historical ways this has been done have been through initializations (i.e. initializing the network at a point such that it is easy to further optimize on each individual task, drawn from some distribution of tasks), and recurrent network structures (where you treat the multiple timesteps of a recurrent network as...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.04821#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.04821#decodyngTue, 26 Jun 2018 04:39:50 -06001804.02464journals/corr/1804.024642Differentiable plasticity: training plastic neural networks with backpropagationCodyWildMeta learning is an area sparking a lot of research curiosity these days. It’s framed in different ways: models that can adapt, models that learn to learn, models that can learn a new task quickly. This paper uses a somewhat different lens: that of neural plasticity, and argues that applying the concept to modern neural networks will give us an effective, and biologically inspired way of building adaptable models. The basic premise of plasticity from a neurobiology perspective (at least how it...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02464#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02464#decodyngTue, 26 Jun 2018 04:39:08 -06001710.03641journals/corr/1710.036413Continuous Adaptation via Meta-Learning in Nonstationary and Competitive EnvironmentsCodyWildDeepMind’s recently released paper (one of a boatload coming out in the wake of ICLR, which just finished in Vancouver) addresses the problem of building an algorithm that can perform well on tasks that don’t just stay fixed in their definition, but instead evolve and change, without giving the agent a chance to re-train in the middle. An example of this, is one used at various points in the paper: of an agent trying to run East, that finds two of its legs (a different two each time) slowly ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.03641#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.03641#decodyngTue, 26 Jun 2018 04:38:18 -06001611.00179journals/corr/1611.001792Dual Learning for Machine TranslationCodyWildThe problem setting of the paper is the desire to perform translation in a monolingual setting, where datasets exist of each language independently, but little or no paired sentence data (paired here meaning that you know you have the same sentence or text in both languages). The paper outlines the prior methods in this area as being, first, training a single-language language model (i.e. train a model to take in a sentence, and return how coherent of a sentence it is in a given language) and us...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.00179#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.00179#decodyngTue, 26 Jun 2018 04:37:32 -06001607.04606journals/corr/1607.046062Enriching Word Vectors with Subword InformationCodyWildThis paper is a clever but conceptually simple idea to improve the vectors learned for individual words. In this proposed approach, instead of learning a distinct vector per word in the word, the model instead views a word as being composed of overlapping n-grams, which are combined to make the full word.
Recall: in the canonical skipgram approach to learning word embeddings, each word is represented by a single vector. The word might be tokenized first (for example, de-pluralized), but, funda...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.04606#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.04606#decodyngTue, 26 Jun 2018 04:34:37 -06001708.00107journals/corr/1708.001072Learned in Translation: Contextualized Word VectorsCodyWildThis paper’s approach goes a step further away from the traditional word embedding approach - of training embeddings as the lookup-table first layer of an unsupervised monolingual network - and proposes a more holistic form of transfer learning that involves not just transferring over learned knowledge contained in a set of vectors, but a fully trained model.
Transfer learning is the general idea of using part or all of a network trained on one task to perform a different task. The most comm...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.00107#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.00107#decodyngTue, 26 Jun 2018 04:32:53 -06001412.6448journals/corr/1412.64482Embedding Word Similarity with Neural Machine TranslationCodyWildIf you’ve been paying any attention to the world of machine learning in the last five years, you’ve likely seen everyone’s favorite example for how Word2Vec word embeddings work: king - man + woman = queen. Given the ubiquity of Word2Vec, and similar unsupervised embeddings, it can be easy to start thinking of them as the canonical definition of what a word embedding *is*. But that’s a little oversimplified. In the context of machine learning, an embedding layer simply means any layer st...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6448#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6448#decodyngTue, 26 Jun 2018 04:31:28 -06001805.09804journals/corr/1805.098042Implicit AutoencodersCodyWild This paper outlines (yet another) variation on a variational autoencoder (VAE), which is, at a high level, a model that seeks to 1) learn to construct realistic samples from the data distribution, and 2) capture meaningful information about the data within its latent space. The “latent space” is a way of referring to the information bottleneck that happens when you compress the input (typically for these examples: an image) into a low-dimensional vector, before trying to predict that input ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09804#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09804#decodyngTue, 26 Jun 2018 04:29:50 -06001804.02476journals/corr/1804.024762Associative Compression Networks for Representation LearningCodyWildThese days, a bulk of recent work in Variational AutoEncoders - a type of generative model - focuses on the question of how to add recently designed, powerful decoders (the part that maps from the compressed information bottleneck to the reconstruction) to VAEs, but still cause them to capture high level, conceptual information within the aforementioned information bottleneck (also know as a latent code). In the status quo, it’s the case that the decoder can do well enough even without conditi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02476#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02476#decodyngTue, 26 Jun 2018 04:28:23 -06001512.09300journals/corr/1512.093002Autoencoding beyond pixels using a learned similarity metricCodyWildVariational Autoencoders are a type of generative model that seek to learn how to generate new data by incentivizing the model to be able to reconstruct input data, after compressing it to a low-dimensional space. Typically, the way that the reconstruction is scored against the original is by comparing the pixel by pixel values: a reconstruction gets a high score if it is able to place pixels of color in the same places that the original did. However, there are compelling reasons why this is a s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1512.09300#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1512.09300#decodyngTue, 26 Jun 2018 04:26:46 -06001711.00937journals/corr/1711.009372Neural Discrete Representation LearningCodyWildThere are mathematicians, still today, who look at deep learning, and get real salty over the lack of convex optimization. That is to say: convex functions are ones where you have an actual guarantees that gradient descent will converge, and mathematicians of olden times (i.e. 2006) spent reams of paper arguing that this or that function had convex properties, and thus could be guaranteed to converge, under this or that set of arcane conditions. And then, Deep Learning came along, with its huge...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.00937#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.00937#decodyngTue, 26 Jun 2018 04:25:50 -06001803.05428journals/corr/1803.054282A Hierarchical Latent Vector Model for Learning Long-Term Structure in MusicCodyWildI’ve spent the last few days pretty deep in the weeds of GAN theory - with all its attendant sample-squinting and arcane training diagnosis - and so today I’m shifting gears to an applied paper, that mostly showcases some clever modifications of an underlying technique. The goal of the MusicVAE is as you might expect: to make music. But the goal isn’t just the ability to produce patterns of notes that sound musical, it’s the ability to learn a vector space where we can modify the values ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.05428#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.05428#decodyngTue, 26 Jun 2018 04:24:17 -06001606.00704journals/corr/1606.007042Adversarially Learned InferenceCodyWildDespite their difficulties in training, Generative Adversarial Networks are still one of the most exciting recent ideas in machine learning; a way to generate data without the fuzziness and averaging of earlier methods. However, up until recently, there had been major way in which the GAN’s primary competitor in the field, the Variational Autoencoder, was superior: it could do inference.
Intuitively, inference is the inverse of generation. Whereas generation works by taking some source of ra...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1606.00704#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1606.00704#decodyngTue, 26 Jun 2018 04:22:27 -06001611.04076journals/corr/1611.040762Least Squares Generative Adversarial NetworksCodyWildGenerative Adversarial Networks (GANs) are an exciting technique, a kernel of an effective concept that has been shown to be able to overcome many of the problems of previous generative models: particularly the fuzziness of VAEs. But, as I’ve mentioned before, and as you’ve doubtless read if you’re read any material about the topic, they’re finicky things, difficult to train in a stable way, and particularly difficult to not devolve into mode collapse. Mode collapse is a phenomenon where...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.04076#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.04076#decodyngTue, 26 Jun 2018 04:21:15 -06001611.02163journals/corr/1611.021632Unrolled Generative Adversarial NetworksCodyWildIf you’ve ever read a paper on Generative Adversarial Networks (from now on: GANs), you’ve almost certainly heard the author refer to the scourge upon the land of GANs that is mode collapse. When a generator succumbs to mode collapse, that means that, instead of modeling the full distribution, of input data, it will choose one region where there is a high density of data, and put all of its generated probability weight there. Then, on the next round, the discriminator pushes strongly away fr...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02163#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02163#decodyngTue, 26 Jun 2018 04:20:16 -06001703.10593journals/corr/1703.105932Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial NetworksCodyWildOver the last five years, artificial creative generation powered by ML has blossomed. We can now imagine buildings based off of a sketch, peer into the dog-tiled “dreams” of a convolutional net, and, as of 2017, turn images of horses into ones of zebras. This last problem - typically termed image-to-image translation- is the one that CycleGAN focuses on. The kinds of transformations that can full under this category is pretty conceptually broad: zebras to horses, summer scenes to winter ones...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.10593#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.10593#decodyngTue, 26 Jun 2018 04:18:31 -06001803.09797journals/corr/1803.097972Women also Snowboard: Overcoming Bias in Captioning ModelsAbir DasConcern about the issue of fairness (or the lack of it) in machine learning models is gaining widespread visibility among general public, the governments as well as the researchers. This is especially alarming as AI enabled systems are becoming more and more pervasive in our society as decisions are being taken by AI agents in healthcare to autonomous driving to criminal justice and so on. Bias in any dataset is, in some way or other, a reflection of the general attitude of humankind towards dif...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09797#dasabir
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09797#dasabirSun, 24 Jun 2018 23:59:32 -06001806.07857journals/corr/1806.078573RUDDER: Return Decomposition for Delayed RewardsAnonymous[Summary by the author on reddit]().
Math aside, the "big idea" of RUDDER is the following: We use an LSTM to predict the return of an episode. To do this, the LSTM will have to recognize what actually causes the reward (e.g. "shooting the gun in the right direction causes the reward, even if we get the reward only once the bullet hits the enemy after travelling along the screen"). We then use a salience method (e.g. LRP or integrated gradients) to get that information out of the LSTM, and redi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07857#anon
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07857#anonSat, 23 Jun 2018 13:44:40 -06001708.04527journals/corr/1708.045272The Trimmed Lasso: Sparsity and RobustnessAnonymousThey created a really nice trick to optimize the $ {L}_{0} $ Pseudo Norm - Regularization on the sorted (By magnitude) values of the optimization variable.
Their code is available at - [The Trimmed Lasso: Sparsity and Robustness]().
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.04527#anon
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.04527#anonSat, 23 Jun 2018 00:27:55 -06001805.11357journals/corr/1805.113572CocoNet: A deep neural network for mapping pixel coordinates to color valuesAnonymousThe experiment is nice.
Though I assume the net practically memorized data and not inferred it as it makes little sense to say something intelligent on the pixel color by its location.
What I wonder if this can be made into something more clever.
A net with memory (RNN?) that gets the pixel coordinate in addition to estimation of pixels in the neighborhood or something.
Anyhow, I wonder if there is a code to replicate results.
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.11357#anon
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.11357#anonSat, 23 Jun 2018 00:10:41 -060010.1016/j.compeleceng.2008.12.0052A hybrid intrusion detection system design for computer network securityKerim Can Kalıpcıoğlu## IDS approaches for events
** Misuse-based: **
Detects events that violate system policy. *Snort* is a signature-based open-source IDS system used for misuse detection in this research.
** Anomaly-based: **
Detects events that contain abnormal activity. Uses statistical, heuristic and data mining methods. *Packet header anomaly detector (PHAD)* [1] and *Network traffic anomaly detector (NETAD)* [2] are used as *Snort* preprocessors for anomaly detection.
## Hybrid architecture
**Preproce...
http://www.shortscience.org/paper?bibtexKey=10.1016/j.compeleceng.2008.12.005#kkalipcioglu
http://www.shortscience.org/paper?bibtexKey=10.1016/j.compeleceng.2008.12.005#kkalipciogluThu, 21 Jun 2018 13:37:46 -06001801.00631journals/corr/1801.006312Deep Learning: A Critical AppraisalPavan RavishankarDeep Learning has a number of shortcomings.
(1)Requires lot of data: Humans can learn abstract concepts with far less training data compared to current deep learning. E.g. If we are told who an “Adult” is, we can answer questions like how many adults are there in home?, Is he an adult? etc. without much data. Convolution networks can solve translational invariance but requires lot more data to identify other translations or more filters or different architectures.
(2)Lack of transfer: Mos...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00631#pavansettigunte
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00631#pavansettigunteWed, 20 Jun 2018 17:07:38 -06001708.02002journals/corr/1708.020023Focal Loss for Dense Object DetectionRyanDsouzaIn object detection the boost in speed and accuracy is mostly gained through network architecture changes.This paper takes a different route towards achieving that goal,They introduce a new loss function called focal loss.
The authors identify class imbalance as the main obstacle toward one stage detectors achieving results which are as good as two stage detectors.
The loss function they introduce is a dynamically scaled cross entropy loss,Where the scaling factor decays to zero as the confide...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.02002#ryandsouza
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.02002#ryandsouzaMon, 18 Jun 2018 12:25:16 -06001702.06559journals/corr/1702.065592Active One-shot LearningFlorian WindolfThe paper combines reinforcement learning with active learning to learn when to request labels to improve prediction accuracy.
- The model can either predict the label at time step $t$ or request it in the next time step, in form of a one-hot vector output of an LSTM with the previous label (if requested) and the current image as an input.
- A reward is issued based on the outcome of requesting labels (-0.05), or correctly (+1) or incorrectly(-1) predicting the label.
- The optimal strategy in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.06559#florianwindolf
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.06559#florianwindolfMon, 18 Jun 2018 07:50:45 -06001710.07283journals/corr/1710.072832Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive LearningluyuchenThe paper starts with the BNN with latent variable and proposes an entropy-based and a variance-based measure of prediction uncertainty. For each uncertainty measure, the authors propose a decomposition of the aleatoric term and epistemic term. A simple regression toy experiment proves this decomposition and its measure of uncertainty. Then the author tries to improve the regression toy experiment performance by using this uncertainty measure into an active learning scheme. For each batch, they ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.07283#luyuchen
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.07283#luyuchenSun, 17 Jun 2018 22:50:17 -06001801.04016journals/corr/1801.040163Theoretical Impediments to Machine Learning With Seven Sparks from the Causal RevolutionPavan RavishankarPaper overviews importance of Causality in AI and highlights important aspects of it. Current state of AI deals with only association/curve fitting of data without need of a model. But this is far from human-like intelligence who have a mental representation that is manipulated from time-to-time using data and queried with What If? questions. To incorporate this, one needs to add two more layers on top of curve fitting module which are interventions(What if I do this?) and counterfactuals(What i...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.04016#pavansettigunte
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.04016#pavansettigunteFri, 08 Jun 2018 14:07:07 -06001609.05518journals/corr/1609.055184Towards Deep Symbolic Reinforcement LearningPavan RavishankarDRL has lot of disadvantages like large data requirement, slow learning, difficult interpretation, difficult transfer, no causality, analogical reasoning done at a statistical level not at a abstract level etc. This can be overcome by adding a symbolic front end on top of DL layer before feeding it to RL agent. Symbolic front end gives advantage of smaller state space generalization, flexible predicate length and easier combination of predicate expressions. DL avoids manual creation of features ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1609.05518#pavansettigunte
http://www.shortscience.org/paper?bibtexKey=journals/corr/1609.05518#pavansettigunteMon, 04 Jun 2018 02:29:11 -06001511.04707journals/corr/1511.047072Deep Linear Discriminant AnalysisAnonymousThere are 2 implementations for the paper:
1. [Reference Implementation of Deep Linear Discriminant Analysis (DeepLDA)]().
2. [VahidooX/DeepLDA]().
[It seems something is wrong with the cost function implemented]().
Also, while they derive the Gradient they didn't verify it and in the implementation use Theano's Auto Grad (While other Auto Grad can't work it out).
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.04707#anon
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.04707#anonSat, 26 May 2018 17:54:32 -0600conf/iccv/ZhangS132Saliency Detection: A Boolean Map ApproachSaeed IzadiMain Purpose:
* The main goal of the proposed method is to exploit a global perception
mechanism, known as figure-ground segregation and Boolean Map Theory of
visual attention to compute saliency map.
Drawbacks of previous works:
* Most of the previous works do not exploit the topological structures of an image to
saliency calculation. Thus this paper aims to exploit the topological structure of a
scene in saliency calculation.
Main Idea:
* Relying on Boolean Map Theory of visual attention, an...
http://www.shortscience.org/paper?bibtexKey=conf/iccv/ZhangS13#saeedizadi
http://www.shortscience.org/paper?bibtexKey=conf/iccv/ZhangS13#saeedizadiSat, 19 May 2018 07:29:15 -06001803.08840journals/corr/abs-1803-088403Effective deep learning training for single-image super-resolution in endomicroscopy exploiting video-registration-based reconstructionSaeed IzadiMain purpose:
* This work proposes a software-based resolution augmentation method which is more agile and simpler to implement than hardware engineering solutions.
* The paper examines three deep learning single image super resolution techniques on pCLE images
* A video-registration based method is proposed to estimate ground truth HR pCLE images (this can be assumed as the main objective of the paper)
Highlights:
* The papers emphasise that this is the first work to address the image resolut...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1803-08840#saeedizadi
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1803-08840#saeedizadiFri, 18 May 2018 22:04:04 -0600conf/nips/ChristianoLBMLA172Deep Reinforcement Learning from Human Preferences.Tianxiao Zhao- explore RL systems with (non-expert) human preferences between pairs of trajectory segments;
- run experiments on some RL tasks, namely **Atari** and **MuJoCo**, and show effectiveness of this approach;
- advantages mentioned:
- no need to access to the reward function;
- less than 1% feedback needed -> reduce the cost of human oversight;
- can learn complex novel behaviors.
## Introduction
**Challenges**
- goals complex, pooly-defined or hard to specify;
- reward function -> behaviors...
http://www.shortscience.org/paper?bibtexKey=conf/nips/ChristianoLBMLA17#txzhao
http://www.shortscience.org/paper?bibtexKey=conf/nips/ChristianoLBMLA17#txzhaoTue, 15 May 2018 00:58:49 -06001705.08245journals/corr/1705.082452Enhanced Experience Replay Generation for Efficient Reinforcement LearningTianxiao Zhao- *issue:* RL on real systems -> sparse and slow data sampling;
- *solution:* pre-train the agent with the EGAN;
- *performance:* ~20% improvement of training time in the beginning of learning compared to no pre-training; ~5% improvement and smaller variations compared to GAN pre-training.
## Introduction
5G telecom systems -> fufill ultra-low latency, high robustness, quick response to changed capacity needs, and dynamic allocation of functionality.
*Problems:*
1. exploration has an impact ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.08245#txzhao
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.08245#txzhaoTue, 15 May 2018 00:56:42 -0600journals/sigart/Sutton912Dyna, an Integrated Architecture for Learning, Planning, and ReactingTianxiao ZhaoMain idea: planning is 'trying things in your head' using an internal model of the world
#### Diagram
#### Generic algorithm
- step 1-3: standard reinforcement learning agent
- step 4: learning of domain knowledge - action model
- step 5: RL from hypothetical, model-generated experiences - planning
#### Action model
input: state, action; output: immediate resulting state and reward
search control: how to select hypothetical state and action
## Potential problems
1. reliance on superv...
http://www.shortscience.org/paper?bibtexKey=journals/sigart/Sutton91#txzhao
http://www.shortscience.org/paper?bibtexKey=journals/sigart/Sutton91#txzhaoTue, 15 May 2018 00:49:28 -0600