ShortScience.org Latest SummariesShortScience.org Latest Summaries
http://www.shortscience.org/
60Fri, 28 Feb 2020 19:01:01 +00001702.08591journals/corr/1702.085912The Shattered Gradients Problem: If resnets are the answer, then what is the question?Gavin GrayImagine you make a neural network mapping a scalar to a scalar. After you initialise this network in the traditional way, randomly with some given variance, you could take the gradient of the input with respect to the output for all reasonable values (between about -3 and 3 because networks typically assume standardised inputs). As the value increases, different rectified linear units in the network will randomly switch on, drawing a random walk in the gradients; another name for which is brown ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.08591#gngdb
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.08591#gngdbWed, 26 Feb 2020 22:21:42 +00001810.00597journals/corr/1810.005972Taming VAEsGavin GrayThe paper provides derivations and intuitions about the learning dynamics for VAEs based on observations about [$\beta$-VAEs][beta]. Using this they derive an alternative way to constrain the training of VAEs that doesn't require typical heuristics, such as warmup or adding noise to the data.
How exactly would this change a typical implementation? Typically, SGD is used to [optimize the ELBO directly](). Using GECO, I keep a moving average of my constraint $C$ (chosen based on what I want the V...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.00597#gngdb
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.00597#gngdbMon, 24 Feb 2020 21:54:36 +0000conf/iclr/LuoSumo20202SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable ModelsChin-WeiIn this note, I'll implement the [Stochastically Unbiased Marginalization Objective (SUMO)]() to estimate the log-partition function of an energy funtion.
Estimation of log-partition function has many important applications in machine learning. Take latent variable models or Bayeisian inference. The log-partition function of the posterior distribution $$p(z|x)=\frac{1}{Z}p(x|z)p(z)$$ is the log-marginal likelihood of the data $$\log Z = \log \int p(x|z)p(z)dz = \log p(x)$$.
More generally, l...
http://www.shortscience.org/paper?bibtexKey=conf/iclr/LuoSumo2020#cw
http://www.shortscience.org/paper?bibtexKey=conf/iclr/LuoSumo2020#cwMon, 17 Feb 2020 05:27:33 +000010.1109/tvcg.2019.28932472Interaction-based Human Activity ComparisonOleksandr BailoThis paper proposes an approach to measure motion similarity between human-human and human-object interaction. The authors claim that human activities are usually defined by the interaction between individual characters, such as a high-five interaction.
As the interaction datasets are not available authors provide multiple small-scale interaction datasets:
where:
- 2C = a Character-Character (2C) database using kick-boxing motions
- CRC = Character-Retargeted Character where the size of charac...
http://www.shortscience.org/paper?bibtexKey=10.1109/tvcg.2019.2893247#ukrdailo
http://www.shortscience.org/paper?bibtexKey=10.1109/tvcg.2019.2893247#ukrdailoTue, 04 Feb 2020 08:51:20 +0000Kool2020Estimating2Estimating Gradients for Discrete Random Variables by Sampling without ReplacementGavin GrayIt's a shame that the authors weren't able to continue their series of [great][reinforce] [paper][attention] [titles][beams], although it looks like they thought about calling this paper **"Put Replacement In Your Basement"**. Also, although they don't say it in the title or abstract, this paper introduces an estimator the authors call the **"unordered set estimator"** which, as a name, is not the best. However, this is one of the most exciting estimators for gradients of non-differentiable expe...
http://www.shortscience.org/paper?bibtexKey=Kool2020Estimating#gngdb
http://www.shortscience.org/paper?bibtexKey=Kool2020Estimating#gngdbMon, 03 Feb 2020 14:53:31 +0000sammon1969mapping2A Nonlinear Mapping for Data Structure AnalysisJoseph Paul CohenThis paper presents what is known as `Sammon's mapping`. This method produces points in any $\mathbb{R}^n$ space using only a distance function between points. You can define any distance function $d^*$ that represents relationships between points. This function can even be non-symmetric. The power is that any relationship encoded into a distance function or distance matrix can be visualized.
For mapping $n$ points from some dimension in another the algorithm starts by generating $n$ random poi...
http://www.shortscience.org/paper?bibtexKey=sammon1969mapping#joecohen
http://www.shortscience.org/paper?bibtexKey=sammon1969mapping#joecohenTue, 21 Jan 2020 05:22:57 +0000conf/nips/ZhangM182Generalizing Tree Probability Estimation via Bayesian NetworksGavin GrayA common problem in phylogenetics is:
1. I have $p(\text{DNA sequences} | \text{tree})$ and $p(\text{tree})$.
2. I've used these to run an MCMC algorithm and generate many (approximate) samples from $p(\text{tree} | \text{DNA sequences})$.
3. I want to evaluate $p(\text{tree} | \text{DNA sequences})$.
The first solution you might think of is to add up how many times you saw each *tree topology* and divide by the total number of MCMC samples; referred to in this paper as *simple sample relative...
http://www.shortscience.org/paper?bibtexKey=conf/nips/ZhangM18#gngdb
http://www.shortscience.org/paper?bibtexKey=conf/nips/ZhangM18#gngdbTue, 14 Jan 2020 16:36:42 +000010.1145/3178876.31861542Latent Relational Metric Learning via Memory-based Attention for Collaborative RankingDarelThis work is a direct improvement of Collaborative Metric Learning. While CML tries to retrieve user and item embeddings in a direct way by placing them in metric space and adjusting with triplet loss, this paper focuses on introduction of latent relational vectors.
A relational vector $r$ must describe relation between user $p$ and item $q$ in a way that $s(p,q)=\parallel \ p + r - q \parallel \approx 0$.
Vectors $r$ are introduced as a softmax-weighted linear combination of vectors from La...
http://www.shortscience.org/paper?bibtexKey=10.1145/3178876.3186154#darel
http://www.shortscience.org/paper?bibtexKey=10.1145/3178876.3186154#darelFri, 10 Jan 2020 14:26:12 +0000conf/asunam/JamshidiRL182Trojan Horses in Amazon's Castle: Understanding the Incentivized Online ReviewsSOJADuring the past few years, sellers have increasingly offered discounted or free products to selected reviewers of ecommerce platforms in exchange for their reviews. Such incentivized (and often very positive) reviews can improve the rating of a product which in turn sways other users’ opinions about the product.
Here, we examine the problem of detecting and characterizing incentivized reviews in two primary categories of Amazon products. We show that the key features of EIRs and normal revi...
http://www.shortscience.org/paper?bibtexKey=conf/asunam/JamshidiRL18#soja
http://www.shortscience.org/paper?bibtexKey=conf/asunam/JamshidiRL18#sojaThu, 09 Jan 2020 23:47:17 +00001811.11804journals/corr/1811.11804219 dubious ways to compute the marginal likelihood of a phylogenetic tree topologyGavin GrayThis paper compares methods to calculate the marginal likelihood, $p(D | \tau)$, when you have a tree topology $\tau$ and some data $D$ and you need to marginalise over the possible branch lengths $\mathbf{\theta}$ in the process of Bayesian inference. In other words, solving the following integral:
$$
\int_{ [ 0, \infty ]^{2S - 3} } p(D | \mathbf{\theta}, \tau ) p( \mathbf{\theta} | \tau) d \mathbf{\theta}
$$
There are some details about this problem that are common to phylogenetic problems, ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.11804#gngdb
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.11804#gngdbFri, 27 Dec 2019 16:32:04 +0000conf/www/HsiehYCLBE172Collaborative Metric LearningDarel## Idea
Use implicit feedback and item features to project users and items into the same latent space to use with kNN later. Learned metric encodes user-item, user-user and item-item relationships.
## Loss
Users and items are represented by vectors $u_i \in \mathcal{R}^r, v_i \in \mathcal{R}^r$.
We define euclidean distance as $d(i,j)= \parallel u_i-v_j\ \parallel$
Loss function consists of 3 parts:
$$\mathcal{L}=\mathcal{L}_m + \lambda_f\mathcal{L}_f + \lambda_c\mathcal{L}_c$$
### Weighted ...
http://www.shortscience.org/paper?bibtexKey=conf/www/HsiehYCLBE17#darel
http://www.shortscience.org/paper?bibtexKey=conf/www/HsiehYCLBE17#darelFri, 27 Dec 2019 15:46:33 +00001911.13299ramanujan2019whats2What's Hidden in a Randomly Weighted Neural Network?devin132The paper: "Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask" by Zhou et al., 2019 found that by just learning binary masks one can find random subnetworks that do much better than chance on a task. This new paper builds on this method by proposing a strong algorithm than Zhou et al. for finding these high-performing subnetworks.
The intuition follows: "If a neural network with random weights (center) is sufficiently overparameterized, it will contain a subnetwork (right) that pe...
http://www.shortscience.org/paper?bibtexKey=ramanujan2019whats#devin132
http://www.shortscience.org/paper?bibtexKey=ramanujan2019whats#devin132Wed, 25 Dec 2019 16:45:12 +00001805.06370journals/corr/1805.063703Progress & Compress: A scalable framework for continual learningdevin132Proposes a two-stage approach for continual learning. An active learning phase and a consolidation phase. The active learning stage optimizes for a specific task that is then consolidated into the knowledge base network via Elastic Weight Consolidation (Kirkpatrick et al., 2016). The active learning phases uses a separate network than the knowledge base, but is not always trained from scratch - authors suggest a heuristic based on task-similarity. Improves EWC by deriving a new online method so ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.06370#devin132
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.06370#devin132Wed, 25 Dec 2019 16:10:54 +0000conf/recsys/XinMPLA172Folding: Why Good Models Sometimes Make Spurious RecommendationsDarelOne bad item can reduce perceived quality of recommendation list. Sometimes this may be particularly undesirable such as recommending horror movies to children. Authors argue that this happens when missing not at random data is handled improperly and separate groups of users and items overlap during the process of dimensionality reduction and computation of embeddings. Folding is a metric that measures the severity of described effect in a recommendation model.
To calculate folding we must intr...
http://www.shortscience.org/paper?bibtexKey=conf/recsys/XinMPLA17#darel
http://www.shortscience.org/paper?bibtexKey=conf/recsys/XinMPLA17#darelTue, 24 Dec 2019 22:13:20 +0000conf/um/FrumermanSSS192Are All Rejected Recommendations Equally Bad?: Towards Analysing Rejected RecommendationsDarel## Idea
When we recommend items to users, some of them are not chosen by the user. These rejected recommendations are usually treated as hard mistakes.
Authors argue that these bad recommendations still may influence user's choice even though they were not picked. For example user didn't click on "Die Hard" but watched another Bruce Willis movie. This seems to be a not so bad recommendation after all and maybe we should not penalize it as hard as we usually do.
Ultimate goal is to invent a me...
http://www.shortscience.org/paper?bibtexKey=conf/um/FrumermanSSS19#darel
http://www.shortscience.org/paper?bibtexKey=conf/um/FrumermanSSS19#darelFri, 20 Dec 2019 15:47:44 +00001906.05243journals/corr/abs-1906-052433When to use parametric models in reinforcement learning?CodyWildThis paper is a bit provocative (especially in the light of the recent DeepMind MuZero paper), and poses some interesting questions about the value of model-based planning. I'm not sure I agree with the overall argument it's making, but I think the experience of reading it made me hone my intuitions around why and when model-based planning should be useful.
The overall argument of the paper is: rather than learning a dynamics model of the environment and then using that model to plan and learn...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-05243#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-05243#decodyngFri, 29 Nov 2019 17:48:19 +00001905.12506journals/corr/abs-1905-125064Are Disentangled Representations Helpful for Abstract Visual Reasoning?CodyWildArguably, the central achievement of the deep learning era is multi-layer neural networks' ability to learn useful intermediate feature representations using a supervised learning signal. In a supervised task, it's easy to define what makes a feature representation useful: the fact that's easier for a subsequent layer to use to make the final class prediction. When we want to learn features in an unsupervised way, things get a bit trickier. There's the obvious problem of what kinds of problem st...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-12506#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-12506#decodyngFri, 29 Nov 2019 07:38:52 +00001906.02768journals/corr/abs-1906-027682Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLPCodyWildSummary: An odd thing about machine learning these days is how far you can get in a line of research while only ever testing your method on image classification and image datasets in general. This leads one occasionally to wonder whether a given phenomenon or advance is a discovery of the field generally, or whether it's just a fact about the informatics and learning dynamics inherent in image data.
This paper, part of a set of recent papers released by Facebook centering around the Lottery Ti...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02768#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02768#decodyngThu, 28 Nov 2019 18:44:16 +00001906.02425journals/corr/abs-1906-024252Uncertainty-guided Continual Learning with Bayesian Neural NetworksMassimo Caccia## Introduction
Bayesian Neural Networks (BNN): intrinsic importance model based on weight uncertainty; variational inference can approximate posterior distributions using Monte Carlo sampling for gradient estimation; acts like an ensemble method in that they reduce the prediction variance but only uses 2x the number of parameters.
The idea is to use BNN's uncertainty to guide gradient descent to not update the important weight when learning new tasks.
## Bayes by Backprop (BBB):
Where $q...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02425#mcaccia
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02425#mcacciaWed, 27 Nov 2019 23:18:04 +00001906.02773journals/corr/abs-1906-027732One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizersCodyWildIn my view, the Lottery Ticket Hypothesis is one of the weirder and more mysterious phenomena of the last few years of Machine Learning. We've known for awhile that we can take trained networks and prune them down to a small fraction of their weights (keeping those weights with the highest magnitudes) and maintain test performance using only those learned weights. That seemed somewhat surprising, in that there were a lot of weights that weren't actually necessary to encoding the learned function...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02773#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02773#decodyngWed, 27 Nov 2019 01:41:31 +00001906.00446journals/corr/abs-1906-004462Generating Diverse High-Fidelity Images with VQ-VAE-2CodyWildVQ-VAE is a Variational AutoEncoder that uses as its information bottleneck a discrete set of codes, rather than a continuous vector. That is: the encoder creates a downsampled spatial representation of the image, where in each grid cell of the downsampled image, the cell is represented by a vector. But, before that vector is passed to the decoder, it's discretized, by (effectively) clustering the vectors the network has historically seen, and substituting each vector with the center of the vect...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-00446#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-00446#decodyngTue, 26 Nov 2019 02:14:37 +00001904.00760journals/corr/abs-1904-007604Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNetCodyWildWhen talking about modern machine learning, particularly on images, it can feel like deep neural networks are a world unto themselves when it comes to complexity. On one hand, there are straightforward things like hand-designed features and linear classifiers, and then on the other, there are these deep, heavily-interacting networks that dazzle us with their performance but seem almost unavoidably difficult to hold in our heads or interpret. This paper, from ICLR 2019 earlier this year, investig...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-00760#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-00760#decodyngMon, 25 Nov 2019 06:29:03 +00001911.08265journals/corr/1911.082653Mastering Atari, Go, Chess and Shogi by Planning with a Learned ModelCodyWildThe successes of deep learning on complex strategic games like Chess and Go have been largely driven by the ability to do tree search: that is, simulating sequences of actions in the environment, and then training policy and value functions to more speedily approximate the results that more exhaustive search reveals. However, this relies on having a good simulator that can predict the next state of the world, given your action. In some games, with straightforward rules, this is easy to explicitl...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1911.08265#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1911.08265#decodyngSun, 24 Nov 2019 02:00:36 +00001707.03497journals/corr/OhSL173Value Prediction NetworkCodyWildRecently, DeepMind released a new paper showing strong performance on board game tasks using a mechanism similar to the Value Prediction Network one in this paper, which inspired me to go back and get a grounding in this earlier work.
A goal of this paper is to design a model-based RL approach that can scale to complex environment spaces, but can still be used to run simulations and do explicit planning. Traditional, model-based RL has worked by learning a dynamics model of the environment - p...
http://www.shortscience.org/paper?bibtexKey=journals/corr/OhSL17#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/OhSL17#decodyngSat, 23 Nov 2019 01:31:07 +0000journals/pami/DengM012Unsupervised Segmentation of Color-Texture Regions in Images and VideoDesiana Nurchalifah**Introduction**
Object segmentation methods are often produced an imprecise result as objects frequently not always agree with homogeneous regions. Thus this paper provides segmentation of images and videos into homogeneous region in color and texture feature cues called JSEG. Assumptions for the environments used are:
* Image contains homogeneous color and texture regions
* Color is quantized
* There are distinct colors in neighboring regions
**Related work**
* Present work in image segment...
http://www.shortscience.org/paper?bibtexKey=journals/pami/DengM01#desiananurchalifah
http://www.shortscience.org/paper?bibtexKey=journals/pami/DengM01#desiananurchalifahFri, 22 Nov 2019 08:40:57 +000010.1109/cvpr.2014.1182Salient Region Detection via High-Dimensional Color TransformDesiana Nurchalifah**Introduction**
* Salient region is an area where a striking combination of features in images is perceived at the first observation.
* These features combined to make a region that has a significant distinction with other areas in the image.
* This paper presents a map of saliency using linear combination of high-dimensional color representation space.
**Related work**
* Present work in saliency detection is divided into two groups which are taking into account low-level features, and statist...
http://www.shortscience.org/paper?bibtexKey=10.1109/cvpr.2014.118#desiananurchalifah
http://www.shortscience.org/paper?bibtexKey=10.1109/cvpr.2014.118#desiananurchalifahFri, 22 Nov 2019 08:27:47 +0000conf/eccv/MairHBSH102Adaptive and Generic Corner Detection Based on the Accelerated Segment TestDesiana Nurchalifah**Introduction :**
Corners, as feature cues in an image, is defined by two edge intersections. This definition has benefit in allowing precise location of the cue, although it is only valid when locality is maintained and the result is similar to the real corner location
**Related work:**
* Corner detector method present are SIFT global tracker that is using Difference of Gaussians and SURF that is using Haar wavelet to approximate Hessian determinat. These methods have drawback in high comput...
http://www.shortscience.org/paper?bibtexKey=conf/eccv/MairHBSH10#desiananurchalifah
http://www.shortscience.org/paper?bibtexKey=conf/eccv/MairHBSH10#desiananurchalifahFri, 22 Nov 2019 08:17:13 +00001907.00456journals/corr/abs-1907-004563Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in DialogCodyWildGiven the tasks that RL is typically used to perform, it can be easy to equate the problem of reinforcement learning with "learning dynamically, online, as you take actions in an environment". And while this does represent most RL problems in the literature, it is possible to learn a reinforcement learning system in an off-policy way (read: trained off of data that the policy itself didn't collect), and there can be compelling reasons to prefer this approach. In this paper, which seeks to train ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1907-00456#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1907-00456#decodyngFri, 22 Nov 2019 02:36:02 +00001910.10683journals/corr/abs-1910-106835Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerCodyWildAt a high level, this paper is a massive (34 pgs!) and highly-resourced study of many nuanced variations of language pretraining tasks, to see which of those variants produce models that transfer the best to new tasks. As a result, it doesn't lend itself *that* well to being summarized into a central kernel of understanding. So, I'm going to do my best to pull out some high-level insights, and recommend you read the paper in more depth if you're working particularly in language pretraining and w...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-10683#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-10683#decodyngThu, 21 Nov 2019 02:25:38 +00001910.12911igl2019generalization4Generalization in Reinforcement Learning with Selective Noise Injection and Information BottleneckCodyWildComing from the perspective of the rest of machine learning, a somewhat odd thing about reinforcement learning that often goes unnoticed is the fact that, in basically all reinforcement learning, performance of an algorithm is judged by its performance on the same environment it was trained on. In the parlance of ML writ large: training on the test set. In RL, most of the focus has historically been on whether automatic systems would be able to learn a policy from the state distribution of a sin...
http://www.shortscience.org/paper?bibtexKey=igl2019generalization#decodyng
http://www.shortscience.org/paper?bibtexKey=igl2019generalization#decodyngWed, 20 Nov 2019 02:30:07 +00001908.01517journals/corr/abs-1908-015175Adversarial Self-Defense for Cycle-Consistent GANsCodyWildDomain translation - for example, mapping from a summer to a winter scene, or from a photorealistic image to an object segmentation map - is often performed by GANs through something called cycle consistency loss. This model works by having, for each domain, a generator to map domain A into domain B, and a discriminator to differentiate between real images from domain B, and those that were constructed through the cross-domain generator. With a given image in domain A, training happens by using ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1908-01517#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1908-01517#decodyngSun, 17 Nov 2019 07:49:55 +00001910.04744journals/corr/abs-1910-047443CATER: A diagnostic dataset for Compositional Actions and TEmporal ReasoningCodyWildIn Machine Learning, our models are lazy: they're only ever as good as the datasets we train them on. If a task doesn't require a given capability in order for a model to solve it, then the model won't gain that capability. This fact motivates a desire on the part of researchers to construct new datasets, to provide both a source of signal and a not-yet-met standard against which models can be measured. This paper focuses on the domain of reasoning about videos and the objects within them across...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-04744#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-04744#decodyngFri, 15 Nov 2019 06:28:28 +00001906.02530journals/corr/abs-1906-025306Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset ShiftCodyWildA common critique of deep learning is its brittleness off-distribution, combined with its tendency to give confident predictions for off-distribution inputs, as is seen in the case of adversarial examples. In response to this critique, a number of different methods have cropped up in recent years, that try to capture a model's uncertainty as well as its overall prediction. This paper tries to do a broad evaluation of uncertainty methods, and, particularly, to test how they perform on out of dist...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02530#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02530#decodyngThu, 14 Nov 2019 03:00:05 +00001906.05838journals/corr/abs-1906-058383Goal-conditioned Imitation LearningCodyWildThis paper combines imitation learning algorithm GAIL with recent advances in goal-conditioned reinforcement learning, to create a combined approach that can make efficient use of demonstrations, but can also learn information about a reward that can allow the agent to outperform the demonstrator.
Goal-conditioned learning is a form of reward-driven reinforcement learning where the reward is a defined to be 1 when an agent reaches a particular state, and 0 otherwise. This can be a particularly...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-05838#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-05838#decodyngWed, 13 Nov 2019 05:49:58 +0000conf/icml/FinnRKL193Online Meta-LearningMassimo Caccia## Introduction
Two distinct research paradigms have studied how prior tasks or experiences can be used by an agent to inform future learning.
* Meta Learning: past experience is used to acquire a prior over model parameters or a learning procedure, and typically studies a setting where a set of meta-training tasks are made available together upfront
* Online learning : a sequential setting where tasks are revealed one after another, but aims to attain zero-shot generalization without any tas...
http://www.shortscience.org/paper?bibtexKey=conf/icml/FinnRKL19#mcaccia
http://www.shortscience.org/paper?bibtexKey=conf/icml/FinnRKL19#mcacciaWed, 13 Nov 2019 00:26:04 +00001810.04777journals/corr/1810.047773Rao-Blackwellized Stochastic Gradients for Discrete DistributionsGavin GrayThis paper approaches the problem of optimizing parameters of a discrete distribution with respect to some loss function that is an expectation over that distribution. In other words, an experiment will probably be a variational autoencoder with discrete latent variables, but there are many real applications:
$$
\mathcal{L} (\eta) : = \mathbb{E}_{z \sim q_{\eta} (z)} \left[ f_{\eta} (z) \right]
$$
Using the [product rule of differentiation][product] the derivative of this loss function can be ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.04777#gngdb
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.04777#gngdbTue, 12 Nov 2019 21:45:52 +00001909.11764journals/corr/abs-1909-117643FreeLB: Enhanced Adversarial Training for Language UnderstandingCodyWildAdversarial examples and defenses to prevent them are often presented as a case of inherent model fragility, where the model is making a clear and identifiable mistake, by misclassifying a label humans would classify correctly. But, another frame on the adversarial examples research is that they're a way of imposing a certain kind of prior requirement on our models: that they be sensitive to certain scales of perturbation to their inputs. One reason to want to do this is because you believe the ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1909-11764#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1909-11764#decodyngTue, 12 Nov 2019 08:17:00 +00001906.02403journals/corr/abs-1906-024033Ease-of-Teaching and Language Structure from Emergent CommunicationCodyWildAn interesting category of machine learning papers - to which this paper belongs - are papers which use learning systems as a way to explore the incentive structures of problems that are difficult to intuitively reason about the equilibrium properties of. In this paper, the authors are trying to better understand how different dynamics of a cooperative communication game between agents, where the speaking agent is trying to describe an object such that the listening agent picks the one the speak...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02403#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02403#decodyngSat, 09 Nov 2019 04:24:54 +0000journals/tog/AbermanWLCC193Learning character-agnostic motion for motion retargeting in 2DOleksandr BailoThis paper presents a method to extract motion (dynamic) and skeleton / camera-view (static) representations from the video of a person represented as a 2D joints skeleton. This decomposition allows transferring the motion to different skeletons (retargeting) and many more. It does so by utilizing deep neural networks.
The architecture consists of motion and skeleton / camera-view encoders that decompose an input sequence of 2D joint positions into latent spaces and a decoder that reconstruc...
http://www.shortscience.org/paper?bibtexKey=journals/tog/AbermanWLCC19#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/tog/AbermanWLCC19#ukrdailoFri, 08 Nov 2019 08:44:59 +00001910.14033journals/corr/abs-1910-140334Plan Arithmetic: Compositional Plan Vectors for Multi-Task ControlCodyWildIf you've been at all aware of machine learning in the past five years, you've almost certainly seen the canonical word2vec example demonstrating additive properties of word embeddings: "king - man + woman = queen". This paper has a goal of designing embeddings for agent plans or trajectories that follow similar principles, such that a task composed of multiple subtasks can be represented by adding the vectors corresponding to the subtasks. For example, if a task involved getting an ax and then ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-14033#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-14033#decodyngFri, 08 Nov 2019 02:33:35 +0000Pavllo_2019_CVPR33D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised TrainingOleksandr BailoThis paper proposes a 3D human pose estimation in video method based on the dilated temporal convolutions applied on 2D keypoints (input to the network). 2D keypoints can be obtained using any person keypoint detector, but Mask R-CNN with ResNet-101 backbone, pre-trained on COCO and fine-tuned on 2D projections from Human3.6M, is used in the paper.
The poses are presented as 2D keypoint coordinates in contrast to using heatmaps (i.e. Gaussian operation applied at the keypoint 2D location). Thu...
http://www.shortscience.org/paper?bibtexKey=Pavllo_2019_CVPR#ukrdailo
http://www.shortscience.org/paper?bibtexKey=Pavllo_2019_CVPR#ukrdailoThu, 07 Nov 2019 04:31:23 +00001910.08210journals/corr/abs-1910-082103RTFM: Generalising to Novel Environment Dynamics via ReadingCodyWildReinforcement learning is notoriously sample-inefficient, and one reason why is that agents learn about the world entirely through experience, and it takes lots of experience to learn useful things. One solution you might imagine to this problem is the ones humans by and large use in encountering new environments: instead of learning everything through first-person exploration, acquiring lots of your knowledge by hearing or reading condensed descriptions of the world that can help you take more ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-08210#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-08210#decodyngThu, 07 Nov 2019 02:29:07 +0000conf/cvpr/0009XLW194Deep High-Resolution Representation Learning for Human Pose EstimationOleksandr BailoThis paper is a top-down (i.e. requires person detection separately) pose estimation method with a focus on improving high-resolution representations (features) to make keypoint detection easier.
During the training stage, this method utilizes annotated bounding boxes of person class to extract ground truth images and keypoints. The data augmentations include random rotation, random scale, flipping, and [half body augmentations]() (feeding upper or lower part of the body separately). Heatmap l...
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/0009XLW19#ukrdailo
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/0009XLW19#ukrdailoWed, 06 Nov 2019 03:33:07 +00001910.13038journals/corr/abs-1910-130383Learning to Predict Without Looking Ahead: World Models Without Forward PredictionCodyWildReinforcement Learning is often broadly separated into two categories of approaches: model-free and model-based. In the former category, networks simply take observations and input and produce predicted best-actions (or predicted values of available actions) as output. In order to perform well, the model obviously needs to gain an understanding of how its actions influence the world, but it doesn't explicitly make predictions about what the state of the world will be after an action is taken. In...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-13038#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1910-13038#decodyngWed, 06 Nov 2019 03:02:48 +000010.1007/978-3-030-01252-6_263MultiPoseNet: Fast Multi-Person Pose Estimation Using Pose Residual NetworkOleksandr BailoThe method is a multi-task learning model performing person detection, keypoint detection, person segmentation, and pose estimation. It is a bottom-up approach as it first localizes identity-free semantics and then group them into instances.
Model structure:
- **Backbone**. A feature extractor is presented by ResNet-(50 or 101) with one [Feature Pyramid Network]() (FPN) for keypoint branch and one for person detection branch. FPN enhances extracted features through multi-level representation....
http://www.shortscience.org/paper?bibtexKey=10.1007/978-3-030-01252-6_26#ukrdailo
http://www.shortscience.org/paper?bibtexKey=10.1007/978-3-030-01252-6_26#ukrdailoTue, 05 Nov 2019 06:55:24 +00001905.10650journals/corr/abs-1905-106503Are Sixteen Heads Really Better than One?CodyWildIn the last two years, the Transformer architecture has taken over the worlds of language modeling and machine translation. The central idea of Transformers is to use self-attention to aggregate information from variable-length sequences, a task for which Recurrent Neural Networks had previously been the most common choice. Beyond that central structural change, one more nuanced change was from having a single attention mechanism on a given layer (with a single set of query, key, and value weigh...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-10650#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-10650#decodyngMon, 04 Nov 2019 16:48:39 +00001903.11780journals/corr/abs-1903-117804Wasserstein Dependency Measure for Representation LearningCodyWildSelf-Supervised Learning is a broad category of approaches whose goal is to learn useful representations by asking networks to perform constructed tasks that only use the content of a dataset itself, and not external labels. The idea with these tasks is to design tasks such that solving them requires the network to have learned useful Some examples of this approach include predicting the rotation of rotated images, reconstructing color from greyscale, and, the topic of this paper, maximizing mu...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-11780#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-11780#decodyngMon, 04 Nov 2019 00:14:35 +00001906.07983journals/corr/abs-1906-079833Explanations can be manipulated and geometry is to blameCodyWildIn response to increasing calls for ways to explain and interpret the predictions of neural networks, one major genre of explanation has been the construction of salience maps for image-based tasks. These maps assign a relevance or saliency score to every pixel in the image, according to various criteria by which the value of a pixel can be said to have influenced the final prediction of the network. This paper is an interesting blend of ideas from the saliency mapping literature with ones from ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-07983#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-07983#decodyngSat, 02 Nov 2019 19:22:37 +00001908.07644journals/corr/abs-1908-076443Saccader: Improving Accuracy of Hard Attention Models for VisionCodyWildIf your goal is to interpret the predictions of neural networks on images, there are a few different ways you can focus your attention. One approach is to try to understand and attach conceptual tags to learnt features, to form a vocabulary with which models can be understood. However, techniques in this family have to content with a number of challenges, from the difficulty in attaching clear concepts to the sheer number of neurons to interpret. An alternate approach, and the one pursued by thi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1908-07644#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1908-07644#decodyngSat, 02 Nov 2019 05:16:45 +000010.1007/978-3-030-01228-1_253Videos as Space-Time Region GraphsOleksandr BailoThis paper tackles the challenge of action recognition by representing a video as space-time graphs: **similarity graph** captures the relationship between correlated objects in the video while the **spatial-temporal graph** captures the interaction between objects.
The algorithm is composed of several modules:
1. **Inflated 3D (I3D) network**. In essence, it is usual 2D CNN (e.g. ResNet-50) converted to 3D CNN by copying 2D weights along an additional dimension and subsequent renormalizatio...
http://www.shortscience.org/paper?bibtexKey=10.1007/978-3-030-01228-1_25#ukrdailo
http://www.shortscience.org/paper?bibtexKey=10.1007/978-3-030-01228-1_25#ukrdailoSun, 13 Oct 2019 04:52:33 +0000conf/icml/YanDMW033Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney StatisticPrateek GuptaIn binary classification task on an imbalanced dataset, we often report *area under the curve* (AUC) of *receiver operating characteristic* (ROC) as the classifier's ability to distinguish two classes.
If there are $k$ errors, accuracy will be the same irrespective of how those $k$ errors are made i.e. misclassification of positive samples or misclassification of negative samples.
AUC-ROC is a metric that treats these misclassifications asymmetrically, making it an appropriate statistic for cla...
http://www.shortscience.org/paper?bibtexKey=conf/icml/YanDMW03#prateekgupta
http://www.shortscience.org/paper?bibtexKey=conf/icml/YanDMW03#prateekguptaMon, 30 Sep 2019 18:43:59 +00001909.04630journals/corr/1909.046304Meta-Learning with Implicit GradientsPrateek GuptaThis paper builds upon the previous work in gradient-based meta-learning methods.
The objective of meta-learning is to find meta-parameters ($\theta$) which can be "adapted" to yield "task-specific" ($\phi$) parameters.
Thus, $\theta$ and $\phi$ lie in the same hyperspace.
A meta-learning problem deals with several tasks, where each task is specified by its respective training and test datasets.
At the inference time of gradient-based meta-learning methods, before the start of each task, one ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1909.04630#prateekgupta
http://www.shortscience.org/paper?bibtexKey=journals/corr/1909.04630#prateekguptaSat, 21 Sep 2019 22:14:45 +00001904.07846journals/corr/abs-1904-078464Temporal Cycle-Consistency Learningjerpint# Overview
This paper presents a novel way to align frames in videos of similar actions temporally in a self-supervised setting. To do so, they leverage the concept of cycle-consistency. They introduce two formulations of cycle-consistency which are differentiable and solvable using standard gradient descent approaches. They name their method Temporal Cycle Consistency (TCC). They introduce a dataset that they use to evaluate their approach and show that their learned embeddings allow for few ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-07846#jeremypinto
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-07846#jeremypintoFri, 20 Sep 2019 19:27:45 +00001710.10571journals/corr/1710.105713Certifying Some Distributional Robustness with Principled Adversarial TrainingJan RocketManA novel method for adversarially-robust learning with theoretical guarantees under small perturbations.
1) Given the default distribution P_0, defines a proximity of it as a set of distributions which are \rho-close to P_0 in terms of Wasserstein metric with a predefined cost function c (e.g. L2);
2) Formulates the robust learning problem as minimization of the worst-case example in the proximity and proposes a Lagrangian relaxation of it;
3) Given it, provides a data-dependent upper bound on...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10571#janrocketman
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10571#janrocketmanThu, 12 Sep 2019 12:38:11 +00001602.04938journals/corr/1602.049384"Why Should I Trust You?": Explaining the Predictions of Any ClassifierApoorva ShettyAlthough Machine learning models have been accepted widely as the next step towards simplifying complex problems, the inner workings of a machine learning model are still unclear and these details can lead to an increase in trust of the model prediction, and the model itself.
**Idea: ** A good explanation system that can justify the prediction of a classifier and can lead to diagnosing the reasoning behind a model can exponentially raise one’s trust in the predictive model.
**Solution: ** T...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.04938#apoorvashetty
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.04938#apoorvashettyTue, 10 Sep 2019 12:31:58 +00001810.03292journals/corr/1810.032923Sanity Checks for Saliency MapsApoorva Shetty**Idea:** With the growing use of visual explanation systems of machine learning models such as saliency maps, there needs to be a standardized method of verifying if a saliency method is correctly describing the underlying ML model.
**Solution:** In this paper two Sanity Checks have been proposed to verify the accuracy and the faithfulness of a saliency method:
* *Model parameter randomization test:* In this sanity check the outputs of a saliency method on a trained model is compared to that o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.03292#apoorvashetty
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.03292#apoorvashettyWed, 04 Sep 2019 15:16:21 +00001907.02057journals/corr/abs-1907-020573Benchmarking Model-Based Reinforcement Learningdav1309This is not a detailed summary, just general notes:
Authors make a excellent and extensive comparison of Model Free, Model based methods in 18 environments. In general, the authors compare 3 classes of Model Based Reinforcement Learning (MBRL) algorithms using as metric for comparison the total return in the environment after 200K steps (reporting the mean and std by taking windows of 5000 steps throughout the whole training - and averaging across 4 seeds for each algorithm). They compare MBRL ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1907-02057#dav1309
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1907-02057#dav1309Tue, 27 Aug 2019 15:39:34 +00001312.6211journals/corr/1312.62115An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural NetworksAndrea Walter RuggeriniThe paper discusses and empirically investigates by empirical testing the effect of "catastrophic forgetting" (**CF**), i.e. the inability of a model to perform a task it was previously trained to perform if retrained to perform a second task.
An illuminating example is what happens in ML systems with convex objectives: regardless of the initialization (i.e. of what was learnt by doing the first task), the training of the second task will always end in the global minimum, thus totally "forgett...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1312.6211#andreaw
http://www.shortscience.org/paper?bibtexKey=journals/corr/1312.6211#andreawMon, 26 Aug 2019 12:36:51 +0000conf/icra/MiliotoMS193Fast Instance and Semantic Segmentation Exploiting Local Connectivity, Metric Learning, and One-Shot Detection for RoboticsHadrien BertrandThe paper proposes a method to perform joint instance and semantic segmentation. The method is fast as it is meant to run in an embedded environment (such as a robot). While the semantic map may seem redundant given the instance one, it is not as semantic segmentation is a key part of obtaining the instance map.
# Architecture
![image]()
The image is first put through a typical CNN encoder (specifically a ResNet derivative), followed by 3 separate decoders. The output of the decoder is at a l...
http://www.shortscience.org/paper?bibtexKey=conf/icra/MiliotoMS19#hbertrand
http://www.shortscience.org/paper?bibtexKey=conf/icra/MiliotoMS19#hbertrandMon, 19 Aug 2019 19:30:51 +00001908.04742journals/corr/1908.047427Online Continual Learning with Maximally Interfered RetrievalMassimo CacciaDisclaimer: I am an author
# Intro
Experience replay (ER) and generative replay (GEN) are two effective continual learning strategies. In the former, samples from a stored memory are replayed to the continual learner to reduce forgetting. In the latter, old data is compressed with a generative model and generated data is replayed to the continual learner. Both of these strategies assume a random sampling of the memories. But learning a new task doesn't cause **equal** interference (forgetting)...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1908.04742#mcaccia
http://www.shortscience.org/paper?bibtexKey=journals/corr/1908.04742#mcacciaWed, 14 Aug 2019 14:49:54 +00001810.01392journals/corr/1810.013923WAIC, but Why? Generative Ensembles for Robust Anomaly DetectionMassimo Caccia### Summary
Knowing when a model is qualified to make a prediction is critical to safe deployment of ML technology. Model-independent / Unsupervised Out-of-Distribution (OoD) detection is appealing mostly because it doesn't require task-specific labels to train. It is tempting to suggest a simple one-tailed test in which lower likelihoods are OoD (assigned by a Likelihood Model), but the intuition that In-Distribution (ID) inputs should have highest likelihoods _does not hold in higher dimension...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.01392#mcaccia
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.01392#mcacciaThu, 01 Aug 2019 22:45:16 +00001905.04610lundberg2019explainable3Explainable AI for Trees: From Local Explanations to Global UnderstandingApoorva ShettyTree-based ML models are becoming increasingly popular, but in the explanation space for these type of models is woefully lacking explanations on a local level. Local level explanations can give a clearer picture on specific use-cases and help pin point exact areas where the ML model maybe lacking in accuracy.
**Idea**: We need a local explanation system for trees, that is not based on simple decision path, but rather weighs each feature in comparison to every other feature to gain better insig...
http://www.shortscience.org/paper?bibtexKey=lundberg2019explainable#apoorvashetty
http://www.shortscience.org/paper?bibtexKey=lundberg2019explainable#apoorvashettyWed, 31 Jul 2019 18:23:34 +0000conf/icml/XuBKCCSZB153Show, Attend and Tell: Neural Image Caption Generation with Visual Attentionjerpint# Summary
The authors present a way to generate captions describing the content of images using attention-based mechanisms. They present two ways of training the network, one via standard backpropagation techniques and another using stochastic processes. They also show how their model can selectively "focus" on the relevant parts of an image to generate appropriate captions, as shown in the classic example of the famous woman throwing a frisbee. Finally, they validate their model on Flicker8k, ...
http://www.shortscience.org/paper?bibtexKey=conf/icml/XuBKCCSZB15#jeremypinto
http://www.shortscience.org/paper?bibtexKey=conf/icml/XuBKCCSZB15#jeremypintoThu, 25 Jul 2019 19:00:11 +000010.1109/cvpr.2018.006363Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answeringjerpint# Summary
This paper presents state-of-the-art methods for both caption generation of images and visual question answering (VQA). The authors build on previous methods by adding what they call a "bottom-up" approach to previous "top-down" attention mechanisms. They show that using their approach they obtain SOTA on both Image captioning (MSCOCO) and the Visual Question and Answering (2017 VQA challenge). They propose a specific network configurations for each. Their biggest contribution is usin...
http://www.shortscience.org/paper?bibtexKey=10.1109/cvpr.2018.00636#jeremypinto
http://www.shortscience.org/paper?bibtexKey=10.1109/cvpr.2018.00636#jeremypintoThu, 25 Jul 2019 17:06:02 +00001507.08439journals/corr/Kula153Metadata Embeddings for User and Item Cold-start RecommendationsMartin ThomaThe idea is to combine collaborative filtering with content-based recommenders to mitigate the user and item coldstart problems.
The author distinguishes between positive and negative interactions.
The representation of a user and of items is the sum of all their latent representations. This sounds similar to "**Asymmetric factor models**" as described in [the BellKor Netflix price solution](). **The key idea is to encode the latent user (or item) vector as a sum of latent attribute vectors.**...
http://www.shortscience.org/paper?bibtexKey=journals/corr/Kula15#martinthoma
http://www.shortscience.org/paper?bibtexKey=journals/corr/Kula15#martinthomaTue, 23 Jul 2019 14:01:54 +0000koren:icdm083Collaborative Filtering for Implicit Feedback DatasetsMartin ThomaThis paper is about a recommendation system approach using collaborative filtering (CF) on implicit feedback datasets.
The core of it is the minimization problem
$$\min_{x_*, y_*} \sum_{u,i} c_{ui} (p_{ui} - x_u^T y_i)^2 + \underbrace{\lambda \left ( \sum_u || x_u ||^2 + \sum_i || y_i ||^2\right )}_{\text{Regularization}}$$
with
* $\lambda \in [0, \infty[$ is a hyper parameter which defines how strong the model is regularized
* $u$ denoting a user, $u_*$ are all user factors $x_u$ combined
*...
http://www.shortscience.org/paper?bibtexKey=koren:icdm08#martinthoma
http://www.shortscience.org/paper?bibtexKey=koren:icdm08#martinthomaTue, 23 Jul 2019 06:09:59 +0000conf/nips/AdebayoGMGHK185Sanity Checks for Saliency MapsHadrien BertrandThe paper designs some basic tests to compare saliency methods. It founds that some of the most popular methods are independent of model parameters and the data, meaning they are effectively useless.
## Methods compared
The paper compare the following methods: gradient explanation, gradient x input, integrated gradients, guided backprop, guided GradCam and SmoothGrad. They provide a refresher on those methods in the appendix.
All those methods can be put in the same framework. They require a ...
http://www.shortscience.org/paper?bibtexKey=conf/nips/AdebayoGMGHK18#hbertrand
http://www.shortscience.org/paper?bibtexKey=conf/nips/AdebayoGMGHK18#hbertrandWed, 17 Jul 2019 20:19:14 +000010.1007/s10994-011-5268-14Robustness and generalizationDavid StutzXu and Mannor provide a theoretical paper on robustness and generalization where their notion of robustness is based on the idea that the difference in loss should be small for samples that are close. This implies that, e.g., for a test sample close to a training sample, the loss on both samples should be similar. The authors formalize this notion as follows:
Definition: Let $A$ be a learning algorithm and $S \subset Z$ be a training set such that $A(S)$ denotes the model learned on $S$ by $A$;...
http://www.shortscience.org/paper?bibtexKey=10.1007/s10994-011-5268-1#davidstutz
http://www.shortscience.org/paper?bibtexKey=10.1007/s10994-011-5268-1#davidstutzTue, 16 Jul 2019 17:19:43 +00001809.03113journals/corr/abs-1809-031133Second-Order Adversarial Attack and Certifiable RobustnessDavid StutzLi et al. propose an adversarial attack motivated by second-order optimization and uses input randomization as defense. Based on a Taylor expansion, the optimal adversarial perturbation should be aligned with the dominant eigenvector of the Hessian matrix of the loss. As the eigenvectors of the Hessian cannot be computed efficiently, the authors propose an approximation; this is mainly based on evaluating the gradient under Gaussian noise. The gradient is then normalized before taking a projecte...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1809-03113#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1809-03113#davidstutzTue, 16 Jul 2019 17:13:29 +00001802.03471journals/corr/1802.034715Certified Robustness to Adversarial Examples with Differential PrivacyDavid StutzLecuyer et al. propose a defense against adversarial examples based on differential privacy. Their main insight is that a differential private algorithm is also robust to slight perturbations. In practice, this amounts to injecting noise in some layer (or on the image directly) and using Monte Carlo estimation for computing the expected prediction. The approach is compared to adversarial training against the Carlini+Wagner attack.
Also find this summary at [davidstutz.de]().
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.03471#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.03471#davidstutzTue, 16 Jul 2019 16:53:19 +0000geirhos2018imagenettrained3ImageNet-trained {CNN}s are biased towards texture; increasing shape bias improves accuracy and robustnessDavid StutzGeirhos et al. show that state-of-the-art convolutional neural networks put too much importance on texture information. This claim is confirmed in a controlled study comparing convolutional neural network and human performance on variants of ImageNet image with removed texture (silhouettes) or on edges. Additionally, networks only considering local information can perform nearly as well as other networks. To avoid this bias, they propose a stylized ImageNet variant where textured are replaced ra...
http://www.shortscience.org/paper?bibtexKey=geirhos2018imagenettrained#davidstutz
http://www.shortscience.org/paper?bibtexKey=geirhos2018imagenettrained#davidstutzTue, 16 Jul 2019 16:36:24 +00001904.00760journals/corr/abs-1904-007605Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNetDavid StutzBrendel and Bethge show empirically that state-of-the-art deep neural networks on ImageNet rely to a large extent on local features, without any notion of interaction between them. To this end, they propose a bag-of-local-features model by applying a ResNet-like architecture on small patches of ImageNet images. The predictions of these local features are then averaged and a linear classifier is trained on top. Due to the locality, this model allows to inspect which areas in an image contribute t...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-00760#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-00760#davidstutzTue, 16 Jul 2019 16:10:57 +00001906.06316journals/corr/abs-1906-063163Towards Stable and Efficient Training of Verifiably Robust Neural NetworksDavid StutzZhang et al. combine interval bound propagation and CROWN, both approaches to obtain bounds on a network’s output, to efficiently train robust networks. Both interval bound propagation (IBP) and CROWN allow to bound a network’s output for a specific set of allowed perturbations around clean input examples. These bounds can be used for adversarial training. The motivation to combine BROWN and IBP stems from the fact that training using IBP bounds usually results in instabilities, while traini...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-06316#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-06316#davidstutzTue, 16 Jul 2019 16:01:19 +0000conf/nips/ZhangWCHD183Efficient Neural Network Robustness Certification with General Activation FunctionsDavid StutzZhang et al. propose CROWN, a method for certifying adversarial robustness based on bounding activations functions using linear functions. Informally, the main result can be stated as follows: if the activation functions used in a deep neural network can be bounded above and below by linear functions (the activation function may also be segmented first), the network output can also be bounded by linear functions. These linear functions can be computed explicitly, as stated in the paper. Then, gi...
http://www.shortscience.org/paper?bibtexKey=conf/nips/ZhangWCHD18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/ZhangWCHD18#davidstutzTue, 16 Jul 2019 15:55:18 +00001901.01672journals/corr/abs-1901-016723Generalization in Deep Networks: The Role of Distance from InitializationDavid StutzNagarajan and Kolter show that neural networks are implicitly regularized by stochastic gradient descent to have small distance from their initialization. This implicit regularization may explain the good generalization performance of over-parameterized neural networks; specifically, more complex models usually generalize better, which contradicts the general trade-off between expressivity and generalization in machine learning. On MNIST, the authors show that the distance of the network’s par...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1901-01672#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1901-01672#davidstutzTue, 16 Jul 2019 15:51:29 +00001810.12715journals/corr/1810.127153On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust ModelsDavid StutzGowal et al. propose interval bound propagation to obtain certified robustness against adversarial examples. In particular, given a neural network consisting of linear layers and monotonic increasing activation functions, a set of allowed perturbations is propagated to obtain upper and lower bounds at each layer. These lead to bounds on the logits of the network; these are used to verify whether the network changes its prediction on the allowed perturbations. Specifically, Gowal et al. consider ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.12715#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.12715#davidstutzTue, 16 Jul 2019 15:47:34 +00001905.02161journals/corr/abs-1905-021613Batch Normalization is a Cause of Adversarial VulnerabilityDavid StutzGalloway et al. argue that batch normalization reduces robustness against noise and adversarial examples. On various vision datasets, including SVHN and ImageNet, with popular self-trained and pre-trained models they empirically demonstrate that networks with batch normalization show reduced accuracy on noise and adversarial examples. As noise, they consider Gaussian additive noise as well as different noise types included in the Cifar-C dataset. Similarly, for adversarial examples, they conside...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-02161#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-02161#davidstutzTue, 16 Jul 2019 15:41:54 +0000journals/cejcs/DashBDC163Radial basis function neural networks: a topical state-of-the-artsurveyDavid StutzDash et al. present a reasonably recent survey on radial basis function (RBF) networks. RBF networks can be understood as two-layer perceptrons, consisting of an input layer, a hidden layer and an output layer. Instead of using a linear operation for computing the hidden layers, RBF kernels are used; as simple example the hidden units are computed as
$h_i = \phi_i(x) = \exp\left(-\frac{\|x - \mu_i\|^2}{2\sigma_i^2}\right)$
where $\mu_i$ and $\sigma_i^2$ are parameters of the kernel. In a clust...
http://www.shortscience.org/paper?bibtexKey=journals/cejcs/DashBDC16#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/cejcs/DashBDC16#davidstutzSun, 14 Jul 2019 17:38:25 +00001903.11257journals/corr/abs-1903-112573How Can We Be So Dense? The Benefits of Using Highly Sparse RepresentationsDavid StutzAhmad and Scheinkman propose a simple sparse layer in order to improve robustness against random noise. Specifically, considering a general linear network layer, i.e.
$\hat{y}^l = W^l y^{l-1} + b^l$ and $y^l = f(\hat{y}^l$
where $f$ is an activation function, the weights are first initialized using a sparse distribution; then, the activation function (commonly ReLU) is replaced by a top-$k$ ReLU version where only the top-$k$ activations are propagated. In experiments, this is shown to improve...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-11257#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-11257#davidstutzSun, 14 Jul 2019 17:29:34 +00001812.03190journals/corr/abs-1812-031903Deep-RBF Networks Revisited: Robust Classification with RejectionDavid StutzZadeh et al. propose a layer similar to radial basis functions (RBFs) to increase a network’s robustness against adversarial examples by rejection. Based on a deep feature extractor, the RBF units compute
$d_k(x) = \|A_k^Tx + b_k\|_p^p$
with parameters $A$ and $b$. The decision rule remains unchanged, but the output does not resemble probabilities anymore. The full network, i.e., feature extractor and RBF layer, is trained using an adapted loss that resembles a max margin loss:
$J = \sum_i ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1812-03190#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1812-03190#davidstutzSun, 14 Jul 2019 17:25:34 +00001809.09262journals/corr/1809.092623Neural Networks with Structural Resistance to Adversarial AttacksDavid StutzDe Alfaro proposes a deep radial basis function (RBF) network to obtain robustness against adversarial examples. In contrast to “regular” RBF networks, which usually consist of only one hidden layer containing RBF units, de Alfaro proposes to stack multiple layers with RBF units. Specifically, a Gaussian unit utilizing the $L_\infty$ norm is used:
$\exp\left( - \max_i(u_i(x_i – w_i))^2\right)$
where $u_i$ and $w_i$ are parameters and $x_i$ are the inputs to the unit – so the network in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.09262#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.09262#davidstutzSun, 14 Jul 2019 17:21:11 +00001905.02175ilyas2019adversarial4Adversarial Examples Are Not Bugs, They Are FeaturesDavid StutzIlyas et al. present a follow-up work to their paper on the trade-off between accuracy and robustness. Specifically, given a feature $f(x)$ computed from input $x$, the feature is considered predictive if
$\mathbb{E}_{(x,y) \sim \mathcal{D}}[y f(x)] \geq \rho$;
similarly, a predictive feature is robust if
$\mathbb{E}_{(x,y) \sim \mathcal{D}}\left[\inf_{\delta \in \Delta(x)} yf(x + \delta)\right] \geq \gamma$.
This means, a feature is considered robust if the worst-case correlation with the l...
http://www.shortscience.org/paper?bibtexKey=ilyas2019adversarial#davidstutz
http://www.shortscience.org/paper?bibtexKey=ilyas2019adversarial#davidstutzSun, 14 Jul 2019 17:13:32 +00001903.12269journals/corr/abs-1903-122693Bit-Flip Attack: Crushing Neural Network withProgressive Bit SearchDavid StutzRakin et al. introduce the bit-flip attack aimed to degrade a network’s performance by flipping a few weight bits. On Cifar10 and ImageNet, common architectures such as ResNets or AlexNet are quantized into 8 bits per weight value (or fewer). Then, on a subset of the validation set, gradients with respect to the training loss are computed and in each layer, bits are selected based on their gradient value. Afterwards, the layer which incurs the maximum increase in training loss is selected. Thi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-12269#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-12269#davidstutzSun, 14 Jul 2019 17:05:05 +0000conf/miccai/ZhangYCFHC175Deep Adversarial Networks for Biomedical Image Segmentation Utilizing Unannotated ImagesJoseph Paul CohenThis work improves the performance of a segmentation network by utilizing unlabelled data. They use a discriminator (they call EN) to distinguish between annotated and unannotated examples. They then train the segmentation generator (they call SN) based on what will fool the discriminator.
Three training phases are shown above
This work is really great. They are using the segmentation to condition the discriminator which will learn to point out flaws when applying the segmentation to the un...
http://www.shortscience.org/paper?bibtexKey=conf/miccai/ZhangYCFHC17#joecohen
http://www.shortscience.org/paper?bibtexKey=conf/miccai/ZhangYCFHC17#joecohenSun, 14 Jul 2019 16:04:19 +00001907.03626journals/corr/1907.036263Benchmarking Deep Learning Hardware and Frameworks: Qualitative MetricsWei DaiBenchmarking Deep Learning Hardware and Frameworks: Qualitative Metrics
Previous papers on benchmarking deep neural networks offer knowledge of deep learning hardware devices and software frameworks. This paper introduces benchmarking principles, surveys machine learning devices including GPUs, FPGAs, and ASICs, and reviews deep learning software frameworks. It also qualitatively compares these technologies with respect to benchmarking from the angles of our 7-metric approach to deep learning ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1907.03626#weidai
http://www.shortscience.org/paper?bibtexKey=journals/corr/1907.03626#weidaiFri, 12 Jul 2019 02:41:50 +00001711.09883journals/corr/1711.098833AI Safety GridworldsdnikuThe paper proposes a standardized benchmark for a number of safety-related problems, and provides an implementation that can be used by other researchers. The problems fall in two categories: specification and robustness. Specification refers to cases where it is difficult to specify a reward function that encodes our intentions. Robustness means that agent's actions should be robust when facing various complexities of a real-world environment. Here is a list of problems:
1. Specification:
1....
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09883#dniku
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09883#dnikuThu, 11 Jul 2019 14:01:20 +00001803.03635journals/corr/1803.036354The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural NetworksDavid StutzFrankle and Carbin discover so-called winning tickets, subset of weights of a neural network that are sufficient to obtain state-of-the-art accuracy. The lottery hypothesis states that dense networks contain subnetworks – the winning tickets – that can reach the same accuracy when trained in isolation, from scratch. The key insight is that these subnetworks seem to have received optimal initialization. Then, given a complex trained network for, e.g., Cifar, weights are pruned based on their ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.03635#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.03635#davidstutzTue, 09 Jul 2019 19:50:56 +00001902.02918journals/corr/1902.029183Certified Adversarial Robustness via Randomized SmoothingDavid StutzCohen et al. study robustness bounds of randomized smoothing, a region-based classification scheme where the prediction is averaged over Gaussian samples around the test input. Specifically, given a test input, the predicted class is the class whose decision region has the largest overlap with a normal distribution of pre-defined variance. The intuition of this approach is that, for small perturbations, the decision regions of classes can’t vary too much. In practice, randomized smoothing is a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1902.02918#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1902.02918#davidstutzTue, 09 Jul 2019 19:44:07 +00001706.02690journals/corr/1706.026903Enhancing The Reliability of Out-of-distribution Image Detection in Neural NetworksDavid StutzLiang et al. propose a perturbation-based approach for detecting out-of-distribution examples using a network’s confidence predictions. In particular, the approaches based on the observation that neural network’s make more confident predictions on images from the original data distribution, in-distribution examples, than on examples taken from a different distribution (i.e., a different dataset), out-distribution examples. This effect can further be amplified by using a temperature-scaled so...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#davidstutzTue, 09 Jul 2019 19:31:52 +00001511.06807journals/corr/1511.068073Adding Gradient Noise Improves Learning for Very Deep NetworksDavid StutzNeelakantan et al. study gradient noise for improving neural network training. In particular, they add Gaussian noise to the gradients in each iteration:
$\tilde{\nabla}f = \nabla f + \mathcal{N}(0, \sigma^2)$
where the variance $\sigma^2$ is adapted throughout training as follows:
$\sigma^2 = \frac{\eta}{(1 + t)^\gamma}$
where $\eta$ and $\gamma$ are hyper-parameters and $t$ the current iteration. In experiments, the authors show that gradient noise has the potential to improve accuracy, es...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.06807#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.06807#davidstutzTue, 09 Jul 2019 19:23:12 +0000conf/iclr/LeeLLS183Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution SamplesDavid StutzLee et al. propose a generative model for obtaining confidence-calibrated classifiers. Neural networks are known to be overconfident in their predictions – not only on examples from the task’s data distribution, but also on other examples taken from different distributions. The authors propose a GAN-based approach to force the classifier to predict uniform predictions on examples not taken from the data distribution. In particular, in addition to the target classifier, a generator and a disc...
http://www.shortscience.org/paper?bibtexKey=conf/iclr/LeeLLS18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/iclr/LeeLLS18#davidstutzTue, 09 Jul 2019 19:12:24 +00001901.04684journals/corr/abs-1901-046843The Limitations of Adversarial Training and the Blind-Spot AttackDavid StutzZhang et al. search for “blind spots” in the data distribution and show that blind spot test examples can be used to find adversarial examples easily. On MNIST, the data distribution is approximated using kernel density estimation were the distance metric is computed in dimensionality-reduced feature space (of an adversarially trained model). For dimensionality reduction, t-SNE is used. Blind spots are found by slightly shifting pixels or changing the gray value of the background. Based on t...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1901-04684#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1901-04684#davidstutzTue, 09 Jul 2019 19:02:32 +00001612.00334journals/corr/1612.003343A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial ExamplesDavid StutzWang et al. discuss an alternative definition of adversarial examples, taking into account an oracle classifier. Adversarial perturbations are usually constrained in their norm (e.g., $L_\infty$ norm for images); however, the main goal of this constraint is to ensure label invariance – if the image didn’t change notable, the label didn’t change either. As alternative formulation, the authors consider an oracle for the task, e.g., humans for image classification tasks. Then, an adversarial ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1612.00334#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1612.00334#davidstutzTue, 09 Jul 2019 18:57:29 +000010.1145/3128572.31404513Towards Poisoning of Deep Learning Algorithms with Back-gradient OptimizationDavid StutzMunoz-Gonzalez et al. propose a multi-class data poisening attack against deep neural networks based on back-gradient optimization. They consider the common poisening formulation stated as follows:
$ \max_{D_c} \min_w \mathcal{L}(D_c \cup D_{tr}, w)$
where $D_c$ denotes a set of poisened training samples and $D_{tr}$ the corresponding clea dataset. Here, the loss $\mathcal{L}$ used for training is minimized as the inner optimization problem. As result, as long as learning itself does not have ...
http://www.shortscience.org/paper?bibtexKey=10.1145/3128572.3140451#davidstutz
http://www.shortscience.org/paper?bibtexKey=10.1145/3128572.3140451#davidstutzTue, 09 Jul 2019 18:41:53 +0000conf/ccs/MengC173MagNet: A Two-Pronged Defense against Adversarial ExamplesDavid StutzMeng and Chen propose MagNet, a combination of adversarial example detection and removal. At test time, given a clean or adversarial test image, the proposed defense works as follows: First, the input is passed through one or multiple detectors. If one of these detectors fires, the input is rejected. To this end, the authors consider detection based on the reconstruction error of an auto-encoder or detection based on the divergence between probability predictions (on adversarial vs. clean exampl...
http://www.shortscience.org/paper?bibtexKey=conf/ccs/MengC17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/ccs/MengC17#davidstutzTue, 09 Jul 2019 18:38:40 +00001707.01159journals/corr/SarkarBMC173UPSET and ANGRI : Breaking High Performance Image ClassifiersDavid StutzSarkar et al. propose two “learned” adversarial example attacks, UPSET and ANGRI. The former, UPSET, learns to predict universal, targeted adversarial examples. The latter, ANGRI, learns to predict (non-universal) targeted adversarial attacks. For UPSET, a network takes the target label as input and learns to predict a perturbation, which added to the original image results in mis-classification; for ANGRI, a network takes both the target label and the original image as input to predict a pe...
http://www.shortscience.org/paper?bibtexKey=journals/corr/SarkarBMC17#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/SarkarBMC17#davidstutzMon, 08 Jul 2019 19:49:38 +00001803.06959journals/corr/1803.069593On the importance of single directions for generalizationDavid StutzMorcos et al. study the influence of ablating single units as a proxy to generalization performance. On Cifar10, for example, a 11-layer convolutional network is trained on the clean dataset, as well as on versions of Cifar10 where a fraction of $p$ samples have corrupted labels. In the latter cases, the network is forced to memorize examples, as there is no inherent structure in the labels assignment. Then, it is experimentally shown that these memorizing networks are less robust to setting who...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.06959#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.06959#davidstutzMon, 08 Jul 2019 19:47:59 +00001803.06978journals/corr/1803.069783Improving Transferability of Adversarial Examples with Input DiversityDavid StutzXie et al. propose to improve the transferability of adversarial examples by computing them based on transformed input images. In particular, they adapt I-FGSM such that, in each iteration, the update is computed on a transformed version of the current image with probability $p$. When, at the same time attacking an ensemble of networks, this is shown to improve transferability.
Also find this summary at [davidstutz.de]().
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.06978#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.06978#davidstutzSat, 06 Jul 2019 11:53:26 +00001712.00699journals/corr/abs-1712-006993Improving Network Robustness against Adversarial Attacks with Compact ConvolutionDavid StutzRanjan et al. propose to constrain deep features to lie on hyperspheres in order to improve robustness against adversarial examples. For the last fully-connected layer, this is achieved by the L2-softmax, which forces the features to lie on the hypersphere. For intermediate convolutional or fully-connected layer, the same effect is achieved analogously, i.e., by normalizing inputs, scaling them and applying the convolution/weight multiplication. In experiments, the authors argue that this improv...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1712-00699#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1712-00699#davidstutzSat, 06 Jul 2019 11:44:19 +0000