ShortScience.org Latest SummariesShortScience.org Latest Summaries
http://www.shortscience.org/
60Sat, 22 Sep 2018 10:31:01 -060010.21105/joss.006762OPEM : Open Source PEM Cell Simulation ToolSepand HaghighiModeling and simulation of proton-exchange membrane fuel cells (PEMFC) may work as a powerful tool in the Research & development of renewable energy sources. The Open-Source PEMFC Simulation Tool (OPEM) is a modeling tool for evaluating the performance of proton exchange membrane fuel cells. This package is a combination of models (static/dynamic) that predict the optimum operating parameters of PEMFC. OPEM contained generic models that will accept as input, not only values of the operating vari...
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00676#sepandhaghighi
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00676#sepandhaghighiSat, 08 Sep 2018 10:00:26 -06001808.07371journals/corr/1808.073713Everybody Dance NowOleksandr BailoThis paper presents a per-frame image-to-image translation system enabling copying of a motion of a person from a source video to a target person. For example, a source video might be a professional dancer performing complicated moves, while the target person is you. By utilizing this approach, it is possible to generate a video of you dancing as a professional. Check the authors' [video]() for the visual explanation.
**Data preparation**
The authors have manually recorded high-resolution vide...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.07371#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.07371#ukrdailoWed, 05 Sep 2018 07:15:05 -06001804.02341journals/corr/1804.023412Compositional Obverter Communication Learning From Raw Visual InputBen BoginThis paper proposes a new training method for multi-agent communication settings. They show the following referential game: A speaker sees an image of a 3d rendered object and describes it to a listener. The listener sees a different image and must decide if it is the same object as described by the speaker (has the same color and shape). The game can only be completed successfully if a communication protocol emerges that can express the color and shape the speaker sees.
The main contribution o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02341#benbogin
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02341#benboginSun, 02 Sep 2018 21:04:11 -060010.21105/joss.007292PyCM: Multiclass confusion matrix library in PythonSepand HaghighiPyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers.
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00729#sepandhaghighi
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00729#sepandhaghighiSat, 01 Sep 2018 22:20:36 -060010.1111/cdep.122822From Babies to Robots: The Contribution of Developmental Robotics to Developmental PsychologyNatalia Diaz Rodriguez, PhDJoint summary from
Developmental robotics is the interdisciplinary approach to the autonomous design of behavioural and cognitive capabilities in artificial agents (robots) that takes direct inspiration from the developmental principles and mechanisms observed in the natural cognitive systems. It relies on a highly interdisciplinary effort of empirical developmental sciences such as developmental psychology, neuroscience, and comparative psychology, and computational and engineering disciplin...
http://www.shortscience.org/paper?bibtexKey=10.1111/cdep.12282#natalia
http://www.shortscience.org/paper?bibtexKey=10.1111/cdep.12282#nataliaThu, 23 Aug 2018 09:55:44 -06001709.04326journals/corr/1709.043263Learning with Opponent-Learning AwarenessmnoukhovNormal RL agents in multi-agent scenarios treat their opponents as a static part of the environment, not taking into account the fact that other agents are learning as well. This paper proposes LOLA, a learning rule that should take the agency and learning of opponents into account by optimizing "return under one step look-ahead of opponent learning"
So instead of optimizing under the current parameters of agent 1 and 2
$$V^1(\theta_i^1, \theta_i^2)$$
LOLA proposes to optimize taking into acc...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#mnoukhov
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#mnoukhovMon, 13 Aug 2018 23:01:16 -06001805.09733journals/corr/1805.097333Towards Robust Evaluations of Continual LearningNatalia Diaz Rodriguez, PhDThrough a likelihood-focused derivation of a variational inference (VI) loss, Variational Generative Experience Replay (VGER) presents the closest appropriate likelihood- focused alternative to Variational Continual Learning (VCL), the state-of the art prior-focused approach to continual learning.
In non continual learning, the aim is to learn parameters $\omega$ using labelled training data $\mathcal{D}$ to infer $p(y|\omega, x)$. In the continual learning context, instead, the data is not in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09733#natalia
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09733#nataliaFri, 10 Aug 2018 11:27:55 -06001806.06621journals/corr/1806.066212Banach Wasserstein GANArtëm SobolevThe paper extends the [WGAN]() paper by replacing the L2 norm in the transportation cost by some other metric $d(x, y)$. By following the same reasoning as in the WGAN paper one arrives at a dual optimization problem similar to the WGAN's one except that the critic $f$ has to be 1-Lipschitz w.r.t. a given norm (rather than L2). This, in turn, means that critic's gradient (w.r.t. input $x$) has to be bounded in the dual norm (only in Banach spaces, hence the name). Authors build upon the [WGAN-GP...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.06621#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.06621#artemsTue, 31 Jul 2018 10:05:24 -06001602.05568journals/corr/1602.055682Multi-layer Representation Learning for Medical ConceptsJoseph Paul CohenThis model called Med2Vec is inspired by Word2Vec. It is Word2Vec for time series patient visits with ICD codes. The model learns embeddings for medical codes as well as the demographics of patients.
The context is temporal. For each $x_t$ as input the model predicts $x_{t+1}$ and $x_{t-1}$ or more depending on the temporal window size.
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.05568#joecohen
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.05568#joecohenSat, 28 Jul 2018 18:09:03 -06001802.00400journals/corr/1802.004002A Comparison of Word Embeddings for the Biomedical Natural Language ProcessingJoseph Paul CohenThis paper demonstrates that Word2Vec \cite{1301.3781} can extract relationships between words and produce latent representations useful for medical data. They explore this model on different datasets which yield different relationships between words.
The Word2Vec model works like an autoencoder that predicts the context of a word. The context of a word is composed of the surrounding words as shown below. Given the word in the center the neighboring words are predicted through a bottleneck in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.00400#joecohen
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.00400#joecohenSat, 28 Jul 2018 17:33:08 -060010.1007/978-3-319-91704-7_113An Experimental Evaluation of the Generalizing Capabilities of Process Discovery Techniques and Black-Box Sequence ModelsNiek Tax# Contributions
The contribution of this paper is three-fold:
1. We present a method to use *process models* as interpretable sequence models that have a stronger notion of interpretability than what is generally used in the machine learning field (see Section *process models* below),
2. We show that this approach enables the comparison of traditional sequence models (RNNs, LSTMs, Markov Models) with techniques from the research field of *automated process discovery*,
3. We show on a collection ...
http://www.shortscience.org/paper?bibtexKey=10.1007/978-3-319-91704-7_11#niektax
http://www.shortscience.org/paper?bibtexKey=10.1007/978-3-319-91704-7_11#niektaxWed, 25 Jul 2018 08:13:25 -06001806.07857journals/corr/1806.078573RUDDER: Return Decomposition for Delayed RewardsAnonymous[Summary by author /u/SirJAM_armedi]().
Math aside, the "big idea" of RUDDER is the following: We use an LSTM to predict the return of an episode. To do this, the LSTM will have to recognize what actually causes the reward (e.g. "shooting the gun in the right direction causes the reward, even if we get the reward only once the bullet hits the enemy after travelling along the screen"). We then use a salience method (e.g. LRP or integrated gradients) to get that information out of the LSTM, and r...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07857#anon
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07857#anonTue, 24 Jul 2018 04:42:30 -06001505.05770journals/corr/1505.057703Variational Inference with Normalizing FlowsCodyWildThis paper argues for the use of normalizing flows - a way of building up new probability distributions by applying multiple sets of invertible transformations to existing distributions - as a way of building more flexible variational inference models.
The central premise of a variational autoencoder is that of learning an approximation to the posterior distribution of latent variables - p(z|x) - and parameterizing that distribution according to values produced by a neural network. In typical ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1505.05770#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1505.05770#decodyngMon, 23 Jul 2018 15:34:55 -06001711.09081journals/corr/1711.090813Deep Extreme Cut: From Extreme Points to Object SegmentationOleksandr BailoThis paper introduces a CNN based segmentation of an object that is defined by a user using four extreme points (i.e. bounding box). Interestingly, in a related work, it has been shown that clicking extreme points is about 5 times more efficient than drawing a bounding box in terms of speed.
The extreme points have several goals in this work. First, they are used as a bounding box to crop the object of interest. Secondly, they are utilized to create a heatmap with activations in the regions o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09081#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09081#ukrdailoMon, 23 Jul 2018 02:25:26 -06001803.09693journals/corr/1803.096933Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++Oleksandr BailoIn this paper, the authors develop a system for automatic as well as an interactive annotation (i.e. segmentation) of a dataset. In the automatic mode, bounding boxes are generated by another network (e.g. FasterRCNN), while in the interactive mode, the input bounding box around an object of interest comes from the human in the loop.
The system is composed of the following parts:
1. **Residual encoder with skip connections**. This step acts as a feature extractor. The ResNet-50 with few modifi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09693#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09693#ukrdailoSun, 22 Jul 2018 07:51:07 -06001806.10474journals/corr/1806.104742The challenge of realistic music generation: modelling raw audio at scaleCodyWildThis paper draws from two strains of recent work: the hierarchical music modeling of MusicVAE - which intentionally model musical structure at both local and more global levels - , and the discrete autoencoder approaches of Vector Quantized VAEs - which seek to maintain the overall structure of a VAE, but apply a less aggressive form of regularization.
The goal of this paper is to build a model that can generate music, not from that music’s symbolic representation - lists of notes - but from ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.10474#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.10474#decodyngSun, 22 Jul 2018 05:48:19 -06001807.01604journals/corr/1807.016042Quasi-Monte Carlo Variational InferenceArtëm SobolevVariational Inference builds around the ELBO (Evidence Lower BOund) -- a lower bound on a marginal log-likelihood of the observed data $\log p(x) = \log \int p(x, z) dz$ (which is typically intractable). The ELBO makes use of an approximate posterior to form a lower bound:
$$
\log p(x) \ge \mathbb{E}_{q(z|x)} \log \frac{p(x, z)}{q(z|x)}
$$
# Introduction to Quasi Monte Carlo
It's assumed that both the join $p(x, z)$ (or, equivalently the likelihood $p(x|z)$ and the prior $p(z)$) and the appro...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.01604#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.01604#artemsFri, 20 Jul 2018 11:01:35 -06001709.04326journals/corr/1709.043264Learning with Opponent-Learning AwarenessCodyWildA central question of this paper is: under what circumstances will you see agents that have been trained to optimize their own reward implement strategies - like tit for tat - that are are more sophisticated and higher overall reward than each agent simply pursuing its dominant strategy. The games under consideration here are “general sum” games like Iterated Prisoner’s Dilemma, where each agent’s dominant strategy is to defect, but with some amount of coordination or reciprocity, better...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#decodyngThu, 19 Jul 2018 16:55:33 -06001806.05759journals/corr/1806.057592Insights on representational similarity in neural networks with canonical correlationCodyWildThe overall goal of the paper is measure how similar different layer activation profiles are to one another, in hopes of being able to quantify the similarity of the representations that different layers are learning. If you had a measure that captured this, you could ask questions like: “how similar are the representations that are learned by different networks on the same task”, and “what is the dynamic of representational change in a given layer throughout training”?
Canonical Corre...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.05759#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.05759#decodyngTue, 17 Jul 2018 23:18:12 -06001802.07535journals/corr/1802.075353BRUNO: A Deep Recurrent Model for Exchangeable DataArtëm SobolevIf one is a Bayesian he or she best expresses beliefs about next observation $x_{n+1}$ after observing $x_1, \dots, x_n$ using the **posterior predictive distribution**: $p(x_{n+1}\vert x_1, \dots, x_n)$. Typically one invokes the de Finetti theorem and assumes there exists an underlying model $p(x\vert\theta)$, hence $p(x_{n+1}\vert x_1, \dots, x_n) = \int p(x_{n+1} \vert \theta) p(\theta \vert x_1, \dots, x_n) d\theta$, however this integral is far from tractable in most cases. Nevertheless, h...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.07535#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.07535#artemsMon, 09 Jul 2018 17:46:37 -06001712.01238journals/corr/1712.012384Learning by Asking QuestionsOleksandr BailoThis paper is about interactive Visual Question Answering (VQA) setting in which agents must ask questions about images to learn. This closely mimics how people learn from each other using natural language and has a strong potential to learn much faster with fewer data. It is referred as learning by asking (LBA) through the paper. The approach is composed of three models:
1. **Question proposal module** is responsible for generating _important_ questions about the image. It is a combination of...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.01238#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.01238#ukrdailoSun, 08 Jul 2018 12:32:56 -06001803.07485journals/corr/1803.074852Actor and Action Video Segmentation from a SentenceOleksandr BailoThis paper performs pixel-wise segmentation of the object of interest which is specified by a sentence. The model is composed of three main components: a **textual encoder**, a **video encoder**, and a **decoder**.
- **Textual encoder** is word2vec pre-trained model followed by 1D CNN.
- **Video encoder** is a 3D CNN to obtain a visual representation of the video (can be combined with optical flow to obtain motion information).
- **Decoder**. Given a sentence representation $T$ a separate filt...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.07485#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.07485#ukrdailoWed, 04 Jul 2018 05:47:29 -06001711.11543journals/corr/1711.115433Embodied Question AnsweringOleksandr BailoThis paper introduces a new AI task - Embodied Question Answering. The goal of this task for an agent is to be able to answer the question by observing the environment through a single egocentric RGB camera while being able to navigate inside the environment. The agent has 4 natural modules:
1. **Vision**. 224x224 RGB images are processed by CNN to produce a fixed-size representation. This CNN is pretrained on pixel-to-pixel tasks such as RGB reconstruction, semantic segmentation, and depth est...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.11543#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.11543#ukrdailoWed, 04 Jul 2018 02:12:50 -060010.18653/v1/p16-10782Tree-to-Sequence Attentional Neural Machine TranslationTim MillerThis work extends sequence-to-sequence models for machine translation by using syntactic information on the source language side. This paper looks at the translation task where English is the source language, and Japanese is the target language. The dataset is the ASPEC corpus of scientific paper abstracts that seem to be in both English and Japanese? (See note below). The trees for the source (English) are generated by running the ENJU parser on the English data, resulting in binary trees, and ...
http://www.shortscience.org/paper?bibtexKey=10.18653/v1/p16-1078#tmills
http://www.shortscience.org/paper?bibtexKey=10.18653/v1/p16-1078#tmillsTue, 03 Jul 2018 15:43:38 -06001804.08328journals/corr/1804.083284Taskonomy: Disentangling Task Transfer LearningOleksandr BailoThe goal of this work is to perform transfer learning among numerous tasks and to discover visual relationships among them. Specifically, while we intiutively might guess the depth of an image and surface normals are related, this work takes a step forward and discovers a beneficial relationship among 26 tasks in terms of task transferability - many of them are not obvious. This is important for scenarios when an insufficient budget is available for target task for annotation, thus, learned repr...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.08328#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.08328#ukrdailoMon, 02 Jul 2018 02:46:39 -06001702.02284journals/corr/1702.022842Adversarial Attacks on Neural Network PoliciesDavid StutzHuang et al. study adversarial attacks on reinforcement learning policies. One of the main problems, in contrast to supervised learning, is that there might not be a reward in any time step, meaning there is no clear objective to use. However, this is essential when crafting adversarial examples as they are mostly based on maximizing the training loss. To avoid this problem, Huang et al. assume a well-trained policy; the policy is expected to output a distribution over actions. Then, adversarial...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.02284#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.02284#davidstutzThu, 28 Jun 2018 19:16:01 -06001712.03141journals/corr/1712.031412Wild Patterns: Ten Years After the Rise of Adversarial Machine LearningDavid StutzBiggio and Roli provide a comprehensive survey and discussion of work in adversarial machine learning. In contrast to related work [1,2], they explicitly discuss the relation of recent developments regarding the security of deep neural networks (as primarily discussed in [1] and [2]) and adversarial machine learning in general. The latter can be traced back to early work starting in 2004, e.g. involving adversarial attacks on spam filters. As a result, terminology used by Biggio and Roli is slig...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.03141#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.03141#davidstutzThu, 28 Jun 2018 19:11:16 -06001801.00553journals/corr/1801.005532Threat of Adversarial Attacks on Deep Learning in Computer Vision: A SurveyDavid StutzAkhtar and Mian present a comprehensive survey of attacks and defenses of deep neural networks, specifically in computer vision. Published on ArXiv in January 2018, but probably written prior to August 2017, the survey includes recent attacks and defenses. For example, Table 1 presents an overview of attacks on deep neural networks – categorized by knowledge, target and perturbation measure. The authors also provide a strength measure – in the form of a 1-5 start “rating”. Personally, ho...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00553#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00553#davidstutzThu, 28 Jun 2018 19:06:48 -06001712.07107journals/corr/1712.071072Adversarial Examples: Attacks and Defenses for Deep LearningDavid StutzYuan et al. present a comprehensive survey of attacks, defenses and studies regarding the robustness and security of deep neural networks. Published on ArXiv in December 2017, it includes most recent attacks and defenses. For examples, Table 1 lists all known attacks – Yuan et al. categorize the attacks according to the level of knowledge needed, targeted or non-targeted, the optimization needed (e.g. iterative) as well as the perturbation measure employed. As a result, Table 1 gives a solid o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.07107#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.07107#davidstutzThu, 28 Jun 2018 18:59:29 -06001605.01775journals/corr/1605.017752Adversarial Diversity and Hard Positive GenerationDavid StutzRozsa et al. propose PASS, an perceptual similarity metric invariant to homographies to quantify adversarial perturbations. In particular, PASS is based on the structural similarity metric SSIM [1]; specifically
$PASS(\tilde{x}, x) = SSIM(\psi(\tilde{x},x), x)$
where $\psi(\tilde{x}, x)$ transforms the perturbed image $\tilde{x}$ to the image $x$ by applying a homography $H$ (which can be found through optimization). Based on this similarity metric, they consider additional attacks which creat...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.01775#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.01775#davidstutzThu, 28 Jun 2018 18:32:44 -06001605.07262journals/corr/1605.072622Measuring Neural Net Robustness with ConstraintsDavid StutzBastani et al. propose formal robustness measures and an algorithm for approximating them for piece-wise linear networks. Specifically, the notion of robustness is similar to related work:
$\rho(f,x) = \inf\{\epsilon \geq 0 | f \text{ is not } (x,\epsilon)\text{-robust}$
where $(x,\epsilon)$-robustness demands that for every $x'$ with $\|x'-x\|_\infty$ it holds that $f(x') = f(x)$ – in other words, the label does not change for perturbations $\eta = x'-x$ which are small in terms of the $L_\...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.07262#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.07262#davidstutzThu, 28 Jun 2018 18:23:07 -06001711.10925journals/corr/1711.109254Deep Image PriorDavid StutzUlyanov et al. utilize untrained neural networks as regularizer/prior for various image restoration tasks such as denoising, inpainting and super-resolution. In particualr, the standard formulation of such tasks, i.e.
$x^\ast = \arg\min_x E(x, x_0) + R(x)$
where $x_0$ is the input image and $E$ a task-dependent data term, is rephrased as follows:
$\theta^\ast = \arg\min_\theta E(f_\theta(z); x_0)$ and $x^\ast = f_{\theta^\ast}(z)$
for a fixed but random $z$. Here, the regularizer $R$ is esse...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10925#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10925#davidstutzThu, 28 Jun 2018 18:14:51 -06001801.02774journals/corr/1801.027742Adversarial SpheresDavid StutzGilmer et al. study the existence of adversarial examples on a synthetic toy datasets consisting of two concentric spheres. The dataset is created by randomly sampling examples from two concentric spheres, one with radius $1$ and one with radius $R = 1.3$. While the authors argue that difference difficulties of the dataset can be created by varying $R$ and the dimensionality, they merely experiment with $R = 1.3$ and a dimensionality of $500$. The motivation to study this dataset comes form the ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02774#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02774#davidstutzThu, 28 Jun 2018 18:02:30 -06001608.08967journals/corr/1608.089673Robustness of classifiers: from adversarial to random noiseDavid StutzFawzi et al. study robustness in the transition from random samples to semi-random and adversarial samples. Specifically they present bounds relating the norm of an adversarial perturbation to the norm of random perturbations – for the exact form I refer to the paper. Personally, I find the definition of semi-random noise most interesting, as it allows to get an intuition for distinguishing random noise from adversarial examples. As in related literature, adversarial examples are defined as
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.08967#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.08967#davidstutzThu, 28 Jun 2018 17:54:18 -06001608.07690journals/corr/1608.076903A Boundary Tilting Persepective on the Phenomenon of Adversarial ExamplesDavid StutzTanay and Griffin introduce the boundary tilting perspective as alternative to the “linear explanation” for adversarial examples. Specifically, they argue that it is not reasonable to assume that the linearity in deep neural networks causes the existence of adversarial examples. Originally, Goodfellow et al. [1] explained the impact of adversarial examples by considering a linear classifier:
$w^T x' = w^Tx + w^T\eta$
where $\eta$ is the adversarial perturbations. In large dimensions, the s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.07690#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.07690#davidstutzThu, 28 Jun 2018 17:50:32 -06001801.09344journals/corr/1801.093443Certified Defenses against Adversarial ExamplesDavid StutzRaghunathan et al. provide an upper bound on the adversarial loss of two-layer networks and also derive a regularization method to minimize this upper bound. In particular, the authors consider the scoring functions $f^i(x) = V_i^T\sigma(Wx)$ with bounded derivative $\sigma'(z) \in [0,1]$ which holds for Sigmoid and ReLU activation functions. Still, the model is very constrained considering recent, well-performng deep (convolutional) neural networks. The upper bound is then derived by considerin...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.09344#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.09344#davidstutzThu, 28 Jun 2018 17:41:17 -0600conf/icml/CisseBGDU173Parseval Networks: Improving Robustness to Adversarial ExamplesDavid StutzCisse et al. propose parseval networks, deep neural networks regularized to learn orthonormal weight matrices. Similar to the work by Hein et al. [1], the mean idea is to constrain the Lipschitz constant of the network – which essentially means constraining the Lipschitz constants of each layer independently. For weight matrices, this can be achieved by constraining the matrix-norm. However, this (depending on the norm used) is often intractable during gradient descent training. Therefore, Cis...
http://www.shortscience.org/paper?bibtexKey=conf/icml/CisseBGDU17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/icml/CisseBGDU17#davidstutzThu, 28 Jun 2018 17:33:43 -06001801.02613journals/corr/1801.026133Characterizing Adversarial Subspaces Using Local Intrinsic DimensionalityDavid StutzMa et al. detect adversarial examples based on their estimated intrinsic dimensionality. I want to note that this work is also similar to [1] – in both publications, local intrinsic dimensionality is used to analyze adversarial examples. Specifically, the intrinsic dimensionality of a sample is estimated based on the radii $r_i(x)$ of the $k$-nearest neighbors around a sample $x$:
$- \left(\frac{1}{k} \sum_{i = 1}^k \log \frac{r_i(x)}{r_k(x)}\right)^{-1}$.
For details regarding the original,...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02613#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02613#davidstutzWed, 27 Jun 2018 21:38:25 -06001703.00410journals/corr/1703.004103Detecting Adversarial Samples from ArtifactsDavid StutzFeinman et al. use dropout to compute an uncertainty measure that helps to identify adversarial examples. Their so-called Bayesian Neural Network Uncertainty is computed as follows:
$\frac{1}{T} \sum_{i=1}^T \hat{y}_i^T \hat{y}_i - \left(\sum_{i=1}^T \hat{y}_i\right)\left(\sum_{i=1}^T \hat{y}_i\right)$
where $\{\hat{y}_1,\ldots,\hat{y}_T\}$ is a set of stochastic predictions (i.e. predictions with different noise patterns in the dropout layers). Here, is can easily be seen that this measure co...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.00410#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.00410#davidstutzWed, 27 Jun 2018 21:29:35 -06001705.07263journals/corr/1705.072633Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection MethodsDavid StutzCarlini and Wagner study the effectiveness of adversarial example detectors as defense strategy and show that most of them can by bypassed easily by known attacks. Specifically, they consider a set of adversarial example detection schemes, including neural networks as detectors and statistical tests. After extensive experiments, the authors provide a set of lessons which include:
- Randomization is by far the most effective defense (e.g. dropout).
- Defenses seem to be dataset-specific. There is...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07263#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07263#davidstutzWed, 27 Jun 2018 21:22:18 -06001702.06280journals/corr/1702.062803On the (Statistical) Detection of Adversarial ExamplesDavid StutzGrosse et al. use statistical tests to detect adversarial examples; additionally, machine learning algorithms are adapted to detect adversarial examples on-the-fly of performing classification. The idea of using statistics tests to detect adversarial examples is simple: assuming that there is a true data distribution, a machine learning algorithm can only approximate this distribution – i.e. each algorithm “learns” an approximate distribution. The ideal adversary uses this discrepancy to d...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.06280#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.06280#davidstutzWed, 27 Jun 2018 21:08:28 -06001711.09404journals/corr/1711.094043Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input GradientsDavid StutzRoss and Doshi-Velez propose input gradient regularization to improve robustness and interpretability of neural networks. As the discussion of interpretability is quite limited in the paper, the main contribution is an extensive evaluation of input gradient regularization against adversarial examples – in comparison to defenses such as distillation or adversarial training. Specifically, input regularization as proposed in [1] is used:
$\arg\min_\theta H(y,\hat{y}) + \lambda \|\nabla_x H(y,\ha...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09404#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09404#davidstutzWed, 27 Jun 2018 20:04:56 -0600conf/nips/HeinA173Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation.David StutzHein and Andriushchenko give a intuitive bound on the robustness of neural networks based on the local Lipschitz constant. With robustness, the authors refer a small $\epsilon$-ball around each sample; this ball is supposed to describe the region where the neural network predicts a constant class. This means that adversarial examples have to compute changes large enough to leave these robust areas. Larger $\epsilon$-balls imply higher robustness to adversarial examples.
When considering a singl...
http://www.shortscience.org/paper?bibtexKey=conf/nips/HeinA17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/HeinA17#davidstutzWed, 27 Jun 2018 19:57:22 -06001802.01421journals/corr/1802.014213Adversarial Vulnerability of Neural Networks Increases With Input DimensionDavid StutzSimon-Gabriel et al. Study the robustness of neural networks with respect to the input dimensionality. Their main hypothesis is that the vulnerability of neural networks against adversarial perturbations increases with the input dimensionality. To support this hypothesis, they provide a theoretical analysis as well as experiments.
The general idea of robustness is that small perturbations $\delta$ of the input $x$ do only result in small variations $\delta \mathcal{L}$ of the loss:
$\delta \ma...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01421#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01421#davidstutzWed, 27 Jun 2018 19:41:53 -06001703.09202journals/corr/1703.092023Biologically inspired protection of deep networks from adversarial attacksDavid StutzNayebi and Ganguli propose saturating neural networks as defense against adversarial examples. The main observation driving this paper can be stated as follows: Neural networks are essentially based on linear sums of neurons (e.g. fully connected layers, convolutiona layers) which are then activated; by injecting a small amount of noise per neuron it is possible to shift the final sum by large values, thereby propagating the noisy through the network and fooling the network into misclassifying a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.09202#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.09202#davidstutzWed, 27 Jun 2018 19:25:51 -06001704.01155journals/corr/1704.011553Feature Squeezing: Detecting Adversarial Examples in Deep Neural NetworksDavid StutzXu et al. propose feature squeezing for detecting and defending against adversarial examples. In particular, they consider “squeezing” the bit depth of the input images as well as local and non-local smoothing (Gaussian, median filtering etc.). In experiments they show that feature squeezing preserves accuracy while defending against adversarial examples. Figure 1 additionally shows an illustration of how feature squeezing can be used to detect adversarial examples.
Figure 1: Illustration ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01155#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01155#davidstutzWed, 27 Jun 2018 19:17:53 -060010.1101/2625013Generative adversarial networks uncover epidermal regulators and predict single cell perturbationsDavid StutzLee et al. propose a variant of adversarial training where a generator is trained simultaneously to generated adversarial perturbations. This approach follows the idea that it is possible to “learn” how to generate adversarial perturbations (as in [1]). In this case, the authors use the gradient of the classifier with respect to the input as hint for the generator. Both generator and classifier are then trained in an adversarial setting (analogously to generative adversarial networks), see t...
http://www.shortscience.org/paper?bibtexKey=10.1101/262501#davidstutz
http://www.shortscience.org/paper?bibtexKey=10.1101/262501#davidstutzWed, 27 Jun 2018 19:08:46 -06001710.10571journals/corr/1710.105713Certifying Some Distributional Robustness with Principled Adversarial TrainingDavid StutzSinha et al. introduce a variant of adversarial training based on distributional robust optimization. I strongly recommend reading the paper for understanding the introduced theoretical framework. The authors also provide guarantees on the obtained adversarial loss – and show experimentally that this guarantee is a realistic indicator. The adversarial training variant itself follows the general strategy of training on adversarially perturbed training samples in a min-max framework. In each ite...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10571#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10571#davidstutzWed, 27 Jun 2018 19:00:07 -06001511.05432journals/corr/1511.054323Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust OptimizationDavid StutzShaham et al. provide an interpretation of adversarial training in the context of robust optimization. In particular, adversarial training is posed as min-max problem (similar to other related work, as I found):
$\min_\theta \sum_i \max_{r \in U_i} J(\theta, x_i + r, y_i)$
where $U_i$ is called the uncertainty set corresponding to sample $x_i$ – in the context of adversarial examples, this might be an $\epsilon$-ball around the sample quantifying the maximum perturbation allowed; $(x_i, y_i)...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.05432#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.05432#davidstutzWed, 27 Jun 2018 18:53:50 -06001511.03034journals/corr/1511.030343Learning with a Strong AdversaryDavid StutzHuang et al. propose a variant of adversarial training called “learning with a strong adversary”. In spirit the idea is also similar to related work [1]. In particular, the authors consider the min-max objective
$\min_g \sum_i \max_{\|r^{(i)}\|\leq c} l(g(x_i + r^{(i)}), y_i)$
where $g$ ranges over expressible functions and $(x_i, y_i)$ is a training sample. In the remainder of the paper, Huang et al. Address the problem of efficiently computing $r^{(i)}$ – i.e. a strong adversarial exam...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.03034#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.03034#davidstutzWed, 27 Jun 2018 18:47:59 -06001507.00677journals/corr/1507.006773Distributional Smoothing with Virtual Adversarial TrainingDavid StutzMiyato et al. propose distributional smoothing (or virtual adversarial training) as defense against adversarial examples. However, I think that both terms do not give a good intuition of what is actually done. Essentially, a regularization term is introduced. Letting $p(y|x,\theta)$ be the learned model, the regularizer is expressed as
$\text{KL}(p(y|x,\theta)|p(y|x+r,\theta)$
where $r$ is the perturbation that maximizes the Kullback-Leibler divergence above, i.e.
$r = \arg\max_r \{\text{KL}(...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1507.00677#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1507.00677#davidstutzWed, 27 Jun 2018 18:43:14 -06001707.06728journals/corr/1707.067283Efficient Defenses Against Adversarial AttacksDavid StutzZantedschi et al. propose Gaussian data augmentation in conjunction with bounded $\text{ReLU}$ activations as defense strategy against adversarial examples. Here, Gaussian data augmentation refers to the practice of adding Gaussian noise to the input during training.
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.06728#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.06728#davidstutzWed, 27 Jun 2018 18:29:49 -06001602.02389journals/corr/1602.023893Ensemble Robustness of Deep Learning AlgorithmsDavid StutzZahavy et al. introduce the concept of ensemble robustness and show that it can be used as indicator for generalization performance. In particular, the main idea is to lift he concept of robustness against adversarial examples to ensemble of networks – as trained, e.g. through Dropout or Bayes-by-Backprop. Letting $Z$ denote the sample set, a learning algorithm is $(K, \epsilon)$ robust if $Z$ can be divided into $K$ disjoint sets $C_1,\ldots,C_K$ such that for every training set $s_1,\ldots,s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.02389#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.02389#davidstutzWed, 27 Jun 2018 18:18:37 -06001712.00673journals/corr/1712.006733Towards Robust Neural Networks via Random Self-ensembleDavid StutzLiu et al. propose randomizing neural networks, implicitly learning an ensemble of models, to defend against adversarial attacks. In particular, they introduce Gaussian noise layers before regular convolutional layers. The noise can be seen as additional parameter of the model. During training, noise is randomly added. During testing, the model is evaluated on a single testing input using multiple random noise vectors; this essentially corresponds to an ensemble of different models (parameterize...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.00673#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.00673#davidstutzWed, 27 Jun 2018 18:07:27 -06001711.01768journals/corr/1711.017683Towards Reverse-Engineering Black-Box Neural NetworksDavid StutzOh et al. propose two different approaches for whitening black box neural networks, i.e. predicting details of their internals such as architecture or training procedure. In particular, they consider attributes regarding architecture (activation function, dropout, max pooling, kernel size of convolutional layers, number of convolutionaly/fully connected layers etc.), attributes concerning optimization (batch size and optimization algorithm) and attributes regarding the data (data split and size)...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.01768#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.01768#davidstutzWed, 27 Jun 2018 17:59:09 -06001704.01547journals/corr/1704.015473Comment on "Biologically inspired protection of deep networks from adversarial attacks"David StutzBrendel et al. propose a decision-based black-box attacks against (deep convolutional) neural networks. Specifically, the so-called Boundary Attack starts with a random adversarial example (i.e. random noise that is not classified as the image to be attacked) and randomly perturbs this initialization to move closer to the target image while remaining misclassified. In pseudo code, the algorithm is described in Algorithm 1. Key component is the proposal distribution $P$ used to guide the adversar...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01547#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01547#davidstutzTue, 26 Jun 2018 21:40:59 -06001708.03999journals/corr/1708.039993ZOO: Zeroth Order Optimization based Black-box Attacks to Deep Neural Networks without Training Substitute ModelsDavid StutzChen et al. propose a gradient-based black-box attack to compute adversarial examples. Specifically, they follow the general idea of [1] where the following objective is optimized:
$\min_x \|x – x_0\|_2 + c \max\{\max_{i\neq t}\{z_i\} – z_t, - \kappa\}$.
Here, $x$ is the adversarial example based on training sample $x_0$. The second part expresses that $x$ is supposed to be misclassified, i.e. the logit $z_i$ for some $i \neq t$ distinct form the true label $t$ is supposed to be larger tha...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.03999#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.03999#davidstutzTue, 26 Jun 2018 21:25:44 -06001708.01697journals/corr/1708.016973Adversarial Robustness: Softmax versus OpenmaxDavid StutzRozsa et al. describe an adersarial attack against OpenMax [1] by directly targeting the logits. Specifically, they assume a network using OpenMax instead of a SoftMax layer to compute the final class probabilities. OpenMax allows “open-set” networks by also allowing to reject input samples. By directly targeting the logits of the trained network, i.e. iteratively pushing the logits in a target direction, it does not matter whether SoftMax or OpenMax layers are used on top, the network can b...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.01697#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.01697#davidstutzTue, 26 Jun 2018 21:19:55 -06001511.07528journals/corr/1511.075283The Limitations of Deep Learning in Adversarial SettingsDavid StutzPapernot et al. Introduce a novel attack on deep networks based on so-called adversarial saliency maps that are computed independently of a loss. Specifically, they consider – for a given network $F(X)$ – the forward derivative
$\nabla F = \frac{\partial F}{\partial X} = \left[\frac{\partial F_j(X)}{\partial x_i}\right]_{i,j}$.
Essentially, this is the regular derivative of $F$ with respect to its input; Papernot et al. seem to refer to is as “forward” derivative as it stands in contra...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.07528#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.07528#davidstutzTue, 26 Jun 2018 21:14:29 -06001712.02779journals/corr/1712.027793A Rotation and a Translation Suffice: Fooling CNNs with Simple TransformationsDavid StutzEngstrom et al. demonstrate that spatial transformations such as translations and rotations can be used to generate adversarial examples. Personally, however, I think that the paper does not address the question where adversarial perturbations “end” and generalization issues “start”. For larger translations and rotations, the problem is clearly a problem of generalization. Small ones could also be interpreted as adversarial perturbations – especially when they are computed under the in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.02779#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.02779#davidstutzTue, 26 Jun 2018 21:05:51 -06001607.02533journals/corr/1607.025333Adversarial examples in the physical worldDavid StutzKurakin et al. demonstrate that adversarial examples are also a concern in the physical world. Specifically, adversarial examples are crafted digitally and then printed to see if the classification network, running on a smartphone still misclassifies the examples. In many cases, adversarial examples are still able to fool the network, even after printing.
Figure 1: Illustration of the experimental setup.
Also find this summary at [davidstutz.de]().
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.02533#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.02533#davidstutzTue, 26 Jun 2018 21:01:38 -06001802.05365journals/corr/1802.053653Deep contextualized word representationsmnoukhovThis paper introduces a deep universal word embedding based on using a bidirectional LM (in this case, biLSTM). First words are embedded with a CNN-based, character-level, context-free, token embedding into $x_k^{LM}$ and then each sentence is parsed using a biLSTM, maximizing the log-likelihood of a word given it's forward and backward context (much like a normal language model).
The innovation is in taking the output of each layer of the LSTM ($h_{k,j}^{LM}$ being the output at layer $j$)
$...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05365#mnoukhov
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05365#mnoukhovTue, 26 Jun 2018 20:56:47 -06001707.03501journals/corr/1707.035013NO Need to Worry about Adversarial Examples in Object Detection in Autonomous VehiclesDavid StutzLu et al. present experiments regarding adversarial examples in the real world, i.e. after printing them. Personally, I find it interesting that researchers are studying how networks can be fooled by physically perturbing images. For me, one of the main conclusions it that it is very hard to evaluate the robustness of networks against physical perturbations. Often it is unclear whether changed lighting conditions, distances or viewpoints to objects might cause the network to fail – which means...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.03501#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.03501#davidstutzTue, 26 Jun 2018 20:56:04 -06001611.01236journals/corr/1611.012363Adversarial Machine Learning at ScaleDavid StutzKurakin et al. present some larger scale experiments using adversarial training on ImageNet to increase robustness. In particular, they claim to be the first using adversarial training on ImageNet. Furthermore, they provide experiments underlining the following conclusions:
- Adversarial training can also be seen as regularizer. This, however, is not surprising as training on noisy training samples is also known to act as regularization.
- Label leaking describes the observation that an adversar...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.01236#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.01236#davidstutzTue, 26 Jun 2018 20:53:02 -06001611.02770journals/corr/1611.027703Delving into Transferable Adversarial Examples and Black-box AttacksDavid StutzLiu et al. provide a comprehensive study on the transferability of adversarial examples considering different attacks and models on ImageNet. In their experiments, they consider both targeted and non-targeted attack and also provide a real-world example by attacking clarifai.com. Here, I want to list some interesting conclusions drawn from their experiments:
- Non-targeted attacks easily transfer between models; targeted-attacks, in contrast, do generally not transfer – meaning that the target...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02770#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02770#davidstutzTue, 26 Jun 2018 20:45:23 -06001610.08401journals/corr/1610.084013Universal adversarial perturbationsDavid StutzMoosavi-Dezfooli et al. propose universal adversarial perturbations – perturbations that are image-agnostic. Specifically, they extend the framework for crafting adversarial examples, i.e. by iteratively solving
$\arg\min_r \|r \|_2$ s.t. $f(x + r) \neq f(x)$.
Here, $r$ denotes the adversarial perturbation, $x$ a training sample and $f$ the neural network. Instead of solving this problem for a specific $x$, the authors propose to solve the problem over the full training set, i.e. in each ite...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1610.08401#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1610.08401#davidstutzTue, 26 Jun 2018 20:39:55 -06001608.04644journals/corr/1608.046443Towards Evaluating the Robustness of Neural NetworksDavid StutzCarlini and Wagner propose three novel methods/attacks for adversarial examples and show that defensive distillation is not effective. In particular, they devise attacks for all three commonly used norms $L_1$, $L_2$ and $L_\infty$ – which are used to measure the deviation of the adversarial perturbation from the original testing sample. In the course of the paper, starting with the targeted objective
$\min_\delta d(x, x + \delta)$ s.t. $f(x + \delta) = t$ and $x+\delta \in [0,1]^n$,
they cons...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.04644#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.04644#davidstutzTue, 26 Jun 2018 20:24:14 -06001706.06083journals/corr/1706.060833Towards Deep Learning Models Resistant to Adversarial AttacksDavid StutzMadry et al. provide an interpretation of training on adversarial examples as sattle-point (i.e. min-max) problem. Based on this formulation, they conduct several experiments on MNIST and CIFAR-10 supporting the following conclusions:
- Projected gradient descent might be “strongest” adversary using first-order information. Here, gradient descent is used to maximize the loss of the classifier directly while always projecting onto the set of “allowed” perturbations (e.g. within an $\epsil...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.06083#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.06083#davidstutzTue, 26 Jun 2018 20:08:20 -06001412.6572journals/corr/1412.65723Explaining and Harnessing Adversarial ExamplesDavid StutzGoodfellow et al. introduce the fast gradient sign method (FGSM) to craft adversarial examples and further provide a possible interpretation of adversarial examples considering linear models. FGSM is a grdient-based, one step method for generating adversarial examples. In particular, letting $J$ be the objective optimized during training and $\epsilon$ be the maximum $\infty$-norm of the adversarial perturbation, FGSM computes
$x' = x + \eta = x + \epsilon \text{sign}(\nabla_x J(x, y))$
where $y...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6572#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6572#davidstutzTue, 26 Jun 2018 20:02:41 -06001705.07204journals/corr/1705.072043Ensemble Adversarial Training: Attacks and DefensesDavid StutzTramèr et al. introduce both a novel adversarial attack as well as a defense mechanism against black-box attacks termed ensemble adversarial training. I first want to highlight that – in addition to the proposed methods – the paper gives a very good discussion of state-of-the-art attacks as well as defenses and how to put them into context. Tramèr et al. consider black-box attacks, focussing on transferrable adversarial examples. Their main observation is as follows: one-shot attacks (i.e....
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07204#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07204#davidstutzTue, 26 Jun 2018 19:56:11 -06001511.04508journals/corr/1511.045082Distillation as a Defense to Adversarial Perturbations against Deep Neural NetworksDavid StutzPapernot et al. build upon the idea of network distillation [1] and propose a simple mechanism to defend networks against adversarial attacks. The main idea of distillation – originally introduced to “distill” the knowledge of very deep networks into smaller ones – is to train a second, possibly smaller network, with the probability distributions of the original, possibly larger network as supervision. Papernot et al. as well as the authors of [1] argue that the probability distributions...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.04508#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.04508#davidstutzTue, 26 Jun 2018 18:29:02 -06001604.03540journals/corr/1604.035402Training Region-based Object Detectors with Online Hard Example MiningRyanDsouzaThe problem statement this paper tries to address is that the training set is distinguished by a large imbalance between the number of foreground examples and background examples-To make the point concrete cases like sliding window object detectors like deformable parts model, the imbalance may be as extreme as 100,000 background examples to one annotated foreground example.
Before i proceed to give you the details of Hard Example mining, i just want to note that HEM in its essence is mostly w...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1604.03540#ryandsouza
http://www.shortscience.org/paper?bibtexKey=journals/corr/1604.03540#ryandsouzaTue, 26 Jun 2018 14:30:07 -06001703.05175journals/corr/1703.051753Prototypical Networks for Few-shot LearningCodyWildThis paper describes an architecture designed for generating class predictions based on a set of features in situations where you may only have a few examples per class, or, even where you see entirely new classes at test time. Some prior work has approached this problem in ridiculously complex fashion, up to and including training a network to predict the gradient outputs of a meta-network that it thinks would best optimize loss, given a new class. The method of Prototypical Networks prides its...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.05175#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.05175#decodyngTue, 26 Jun 2018 05:00:56 -06001710.04087journals/corr/1710.040872Word Translation Without Parallel DataCodyWildThe core goal of this paper is to perform in an unsupervised (read: without parallel texts) way what other machine translation researchers had previously only effectively performed in a supervised way: the creation of a word-to-word translational mapping between natural languages. To frame the problem concretely: the researchers start with word embeddings learned in each language independently, and their desired output is a set of nearest neighbors for a source word that contains the true target...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.04087#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.04087#decodyngTue, 26 Jun 2018 04:58:44 -06001805.04770journals/corr/1805.047703Born Again Neural NetworksCodyWildA finding first publicized by Geoff Hinton is the fact that, when you train a simple, lower capacity module on the probability outputs of another model, you can often get a model that has comparable performance, despite that lowered capacity. Another, even more interesting finding is that, if you take a trained model, and train a model with identical structure on its probability outputs, you can often get a model with better performance than the original teacher, with quicker convergence.
This ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.04770#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.04770#decodyngTue, 26 Jun 2018 04:46:03 -06001802.05751journals/corr/1802.057513Image TransformerCodyWildLast year, a machine translation paper came out, with an unfortunately un-memorable name (the Transformer network) and a dramatic proposal for sequence modeling that eschewed both Recurrent NNN and Convolutional NN structures, and, instead, used self-attention as its mechanism for “remembering” or aggregating information from across an input. Earlier this month, the same authors released an extension of that earlier paper, called Image Transformer, that applies the same attention-only approa...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05751#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05751#decodyngTue, 26 Jun 2018 04:45:23 -06001711.02827journals/corr/1711.028272Inverse Reward DesignCodyWild
It’s a commonly understood problem in Reinforcement Learning: that it is difficult to fully specify your exact reward function for an agent you’re training, especially when that agent will need to operate in conditions potentially different than those it was trained in. The canonical example of this, used throughout the Inverse Rewards Design paper, is that of an agent trained on an environment of grass and dirt, that now encounters an environment with lava. In a typical problem setup, the ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.02827#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.02827#decodyngTue, 26 Jun 2018 04:44:50 -06001704.06960journals/corr/1704.069602Translating NeuraleseCodyWildThis paper has an unusual and interesting goal, compared to those I more typically read: it wants to develop a “translation” between the messages produced by a model, and natural language used by a human. More specifically, the paper seeks to do this in the context of an two-player game, where one player needs to communicate information to the other. A few examples of this are:
- Being shown a color, and needing to communicate to your partner so they can choose that color
- Driving, in an ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.06960#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.06960#decodyngTue, 26 Jun 2018 04:44:18 -06001805.11604journals/corr/1805.116044How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift)CodyWildAt NIPS 2017, Ali Rahimi was invited on stage to give a keynote after a paper he was on received the “Test of Time” award. While there, in front of several thousand researchers, he gave an impassioned argument for more rigor: more small problems to validate our assumptions, more visibility into why our optimization algorithms work the way they do. The now-famous catchphrase of the talk was “alchemy”; he argued that the machine learning community has been effective at finding things that ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.11604#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.11604#decodyngTue, 26 Jun 2018 04:42:50 -06001803.08494journals/corr/1803.084942Group NormalizationCodyWildIf you were to survey researchers, and ask them to name the 5 most broadly influential ideas in Machine Learning from the last 5 years, I’d bet good money that Batch Normalization would be somewhere on everyone’s lists. Before Batch Norm, training meaningfully deep neural networks was an unstable process, and one that often took a long time to converge to success. When we added Batch Norm to models, it allowed us to increase our learning rates substantially (leading to quicker training) with...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.08494#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.08494#decodyngTue, 26 Jun 2018 04:42:07 -06001804.04849journals/corr/1804.048492The unreasonable effectiveness of the forget gateCodyWildI have a lot of fondness for this paper as a result of its impulse towards clear explanations, simplicity, and pushing back against complexity for complexity’s sake. The goal of the paper is pretty straightforward. Long Short Term Memory networks (LSTM) work by having a memory vector, and pulling information into and out of that vector through a gating system. These gates take as input the context of the network at a given timestep (the prior hidden state, and the current input), apply weight ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.04849#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.04849#decodyngTue, 26 Jun 2018 04:40:30 -06001802.04821journals/corr/1802.048213Evolved Policy GradientsCodyWildThe general goal of meta-learning systems is to learn useful shared structure across a broad distribution of tasks, in such a way that learning on a new task can be faster. Some of the historical ways this has been done have been through initializations (i.e. initializing the network at a point such that it is easy to further optimize on each individual task, drawn from some distribution of tasks), and recurrent network structures (where you treat the multiple timesteps of a recurrent network as...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.04821#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.04821#decodyngTue, 26 Jun 2018 04:39:50 -06001804.02464journals/corr/1804.024642Differentiable plasticity: training plastic neural networks with backpropagationCodyWildMeta learning is an area sparking a lot of research curiosity these days. It’s framed in different ways: models that can adapt, models that learn to learn, models that can learn a new task quickly. This paper uses a somewhat different lens: that of neural plasticity, and argues that applying the concept to modern neural networks will give us an effective, and biologically inspired way of building adaptable models. The basic premise of plasticity from a neurobiology perspective (at least how it...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02464#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02464#decodyngTue, 26 Jun 2018 04:39:08 -06001710.03641journals/corr/1710.036413Continuous Adaptation via Meta-Learning in Nonstationary and Competitive EnvironmentsCodyWildDeepMind’s recently released paper (one of a boatload coming out in the wake of ICLR, which just finished in Vancouver) addresses the problem of building an algorithm that can perform well on tasks that don’t just stay fixed in their definition, but instead evolve and change, without giving the agent a chance to re-train in the middle. An example of this, is one used at various points in the paper: of an agent trying to run East, that finds two of its legs (a different two each time) slowly ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.03641#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.03641#decodyngTue, 26 Jun 2018 04:38:18 -06001611.00179journals/corr/1611.001792Dual Learning for Machine TranslationCodyWildThe problem setting of the paper is the desire to perform translation in a monolingual setting, where datasets exist of each language independently, but little or no paired sentence data (paired here meaning that you know you have the same sentence or text in both languages). The paper outlines the prior methods in this area as being, first, training a single-language language model (i.e. train a model to take in a sentence, and return how coherent of a sentence it is in a given language) and us...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.00179#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.00179#decodyngTue, 26 Jun 2018 04:37:32 -06001607.04606journals/corr/1607.046062Enriching Word Vectors with Subword InformationCodyWildThis paper is a clever but conceptually simple idea to improve the vectors learned for individual words. In this proposed approach, instead of learning a distinct vector per word in the word, the model instead views a word as being composed of overlapping n-grams, which are combined to make the full word.
Recall: in the canonical skipgram approach to learning word embeddings, each word is represented by a single vector. The word might be tokenized first (for example, de-pluralized), but, funda...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.04606#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.04606#decodyngTue, 26 Jun 2018 04:34:37 -06001708.00107journals/corr/1708.001072Learned in Translation: Contextualized Word VectorsCodyWildThis paper’s approach goes a step further away from the traditional word embedding approach - of training embeddings as the lookup-table first layer of an unsupervised monolingual network - and proposes a more holistic form of transfer learning that involves not just transferring over learned knowledge contained in a set of vectors, but a fully trained model.
Transfer learning is the general idea of using part or all of a network trained on one task to perform a different task. The most comm...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.00107#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.00107#decodyngTue, 26 Jun 2018 04:32:53 -06001412.6448journals/corr/1412.64482Embedding Word Similarity with Neural Machine TranslationCodyWildIf you’ve been paying any attention to the world of machine learning in the last five years, you’ve likely seen everyone’s favorite example for how Word2Vec word embeddings work: king - man + woman = queen. Given the ubiquity of Word2Vec, and similar unsupervised embeddings, it can be easy to start thinking of them as the canonical definition of what a word embedding *is*. But that’s a little oversimplified. In the context of machine learning, an embedding layer simply means any layer st...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6448#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6448#decodyngTue, 26 Jun 2018 04:31:28 -06001805.09804journals/corr/1805.098042Implicit AutoencodersCodyWild This paper outlines (yet another) variation on a variational autoencoder (VAE), which is, at a high level, a model that seeks to 1) learn to construct realistic samples from the data distribution, and 2) capture meaningful information about the data within its latent space. The “latent space” is a way of referring to the information bottleneck that happens when you compress the input (typically for these examples: an image) into a low-dimensional vector, before trying to predict that input ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09804#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09804#decodyngTue, 26 Jun 2018 04:29:50 -06001804.02476journals/corr/1804.024762Associative Compression Networks for Representation LearningCodyWildThese days, a bulk of recent work in Variational AutoEncoders - a type of generative model - focuses on the question of how to add recently designed, powerful decoders (the part that maps from the compressed information bottleneck to the reconstruction) to VAEs, but still cause them to capture high level, conceptual information within the aforementioned information bottleneck (also know as a latent code). In the status quo, it’s the case that the decoder can do well enough even without conditi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02476#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02476#decodyngTue, 26 Jun 2018 04:28:23 -06001512.09300journals/corr/1512.093002Autoencoding beyond pixels using a learned similarity metricCodyWildVariational Autoencoders are a type of generative model that seek to learn how to generate new data by incentivizing the model to be able to reconstruct input data, after compressing it to a low-dimensional space. Typically, the way that the reconstruction is scored against the original is by comparing the pixel by pixel values: a reconstruction gets a high score if it is able to place pixels of color in the same places that the original did. However, there are compelling reasons why this is a s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1512.09300#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1512.09300#decodyngTue, 26 Jun 2018 04:26:46 -06001711.00937journals/corr/1711.009372Neural Discrete Representation LearningCodyWildThere are mathematicians, still today, who look at deep learning, and get real salty over the lack of convex optimization. That is to say: convex functions are ones where you have an actual guarantees that gradient descent will converge, and mathematicians of olden times (i.e. 2006) spent reams of paper arguing that this or that function had convex properties, and thus could be guaranteed to converge, under this or that set of arcane conditions. And then, Deep Learning came along, with its huge...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.00937#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.00937#decodyngTue, 26 Jun 2018 04:25:50 -06001803.05428journals/corr/1803.054282A Hierarchical Latent Vector Model for Learning Long-Term Structure in MusicCodyWildI’ve spent the last few days pretty deep in the weeds of GAN theory - with all its attendant sample-squinting and arcane training diagnosis - and so today I’m shifting gears to an applied paper, that mostly showcases some clever modifications of an underlying technique. The goal of the MusicVAE is as you might expect: to make music. But the goal isn’t just the ability to produce patterns of notes that sound musical, it’s the ability to learn a vector space where we can modify the values ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.05428#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.05428#decodyngTue, 26 Jun 2018 04:24:17 -06001606.00704journals/corr/1606.007042Adversarially Learned InferenceCodyWildDespite their difficulties in training, Generative Adversarial Networks are still one of the most exciting recent ideas in machine learning; a way to generate data without the fuzziness and averaging of earlier methods. However, up until recently, there had been major way in which the GAN’s primary competitor in the field, the Variational Autoencoder, was superior: it could do inference.
Intuitively, inference is the inverse of generation. Whereas generation works by taking some source of ra...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1606.00704#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1606.00704#decodyngTue, 26 Jun 2018 04:22:27 -06001611.04076journals/corr/1611.040762Least Squares Generative Adversarial NetworksCodyWildGenerative Adversarial Networks (GANs) are an exciting technique, a kernel of an effective concept that has been shown to be able to overcome many of the problems of previous generative models: particularly the fuzziness of VAEs. But, as I’ve mentioned before, and as you’ve doubtless read if you’re read any material about the topic, they’re finicky things, difficult to train in a stable way, and particularly difficult to not devolve into mode collapse. Mode collapse is a phenomenon where...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.04076#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.04076#decodyngTue, 26 Jun 2018 04:21:15 -06001611.02163journals/corr/1611.021632Unrolled Generative Adversarial NetworksCodyWildIf you’ve ever read a paper on Generative Adversarial Networks (from now on: GANs), you’ve almost certainly heard the author refer to the scourge upon the land of GANs that is mode collapse. When a generator succumbs to mode collapse, that means that, instead of modeling the full distribution, of input data, it will choose one region where there is a high density of data, and put all of its generated probability weight there. Then, on the next round, the discriminator pushes strongly away fr...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02163#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02163#decodyngTue, 26 Jun 2018 04:20:16 -06001703.10593journals/corr/1703.105932Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial NetworksCodyWildOver the last five years, artificial creative generation powered by ML has blossomed. We can now imagine buildings based off of a sketch, peer into the dog-tiled “dreams” of a convolutional net, and, as of 2017, turn images of horses into ones of zebras. This last problem - typically termed image-to-image translation- is the one that CycleGAN focuses on. The kinds of transformations that can full under this category is pretty conceptually broad: zebras to horses, summer scenes to winter ones...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.10593#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.10593#decodyngTue, 26 Jun 2018 04:18:31 -06001803.09797journals/corr/1803.097972Women also Snowboard: Overcoming Bias in Captioning ModelsAbir DasConcern about the issue of fairness (or the lack of it) in machine learning models is gaining widespread visibility among general public, the governments as well as the researchers. This is especially alarming as AI enabled systems are becoming more and more pervasive in our society as decisions are being taken by AI agents in healthcare to autonomous driving to criminal justice and so on. Bias in any dataset is, in some way or other, a reflection of the general attitude of humankind towards dif...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09797#dasabir
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09797#dasabirSun, 24 Jun 2018 23:59:32 -0600