ShortScience.org Latest SummariesShortScience.org Latest Summaries
http://www.shortscience.org/
60Thu, 17 Jan 2019 13:01:02 07001812.09916journals/corr/1812.099163Improving MMDGAN Training with Repulsive Loss Functionrichard_wth**TL;DR**: Rearranging the terms in Maximum Mean Discrepancy yields a much better loss function for the discriminator of Generative Adversarial Nets.
**Keywords**: Generative adversarial nets, Maximum Mean Discrepancy, spectral normalization, convolutional neural networks, Gaussian kernel, local stability.
**Summary**
Generative adversarial nets (GANs) are widely used to learn the data sampling process and are notoriously difficult to train. The training of GANs may be improved from three asp...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1812.09916#richardwth
http://www.shortscience.org/paper?bibtexKey=journals/corr/1812.09916#richardwthTue, 15 Jan 2019 05:07:15 07001802.03685journals/corr/abs1802036852Learning a SAT Solver from SingleBit SupervisionameroyerThe goal is to solve SAT problems with weak supervision: In that case a model is trained only to predict ***the satisfiability*** of a formula in conjunctive normal form. As a byproduct, when the formula is satisfiable, an actual satisfying assignment can be worked out by clustering the network's activations in most cases.
* **Pros (+):** Weak supervision, interesting structured architecture, seems to generalize nicely to harder problems by increasing the number message passing iterations.
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs180203685#ameroyer
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs180203685#ameroyerMon, 14 Jan 2019 12:58:19 07001809.01442journals/corr/1809.014423Data Augmentation for Skin Lesion AnalysisFábio Perez_Disclaimer: I'm the first author of this paper._
The code for this paper can be found at .
In this work, we wanted to compare different data augmentation scenarios for skin lesion analysis. We tried 13 scenarios, including commonly used augmentation techniques (color and geometry transformations), unusual ones (random erasing, elastic transformation, and a novel lesion mix to simulate collision lesions), and a combination of those.
Examples of the augmentation scenarios:
a) no augmentati...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.01442#fabioperez
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.01442#fabioperezMon, 14 Jan 2019 10:23:45 07001802.10217journals/corr/1802.102174Investigating Human Priors for Playing Video GamesFábio PerezAuthors investigated why humans play some video games better than machines. That is the case for games that do not have continuous rewards (e.g., scores). They experimented with a game  inspired by _Montezuma's Revenge_  in which the player has to climb stairs, collect keys and jump over enemies. RL algorithms can only know if they succeed if they finish the game, as there is no rewards during the gameplay, so they tend to do much worse than humans in these games.
To compare between humans ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.10217#fabioperez
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.10217#fabioperezFri, 28 Dec 2018 20:19:27 07001710.10196journals/corr/abs1710101963Progressive Growing of GANs for Improved Quality, Stability, and VariationANIRUDH NJ
## **Keywords**
Progressive GAN , High resolution generator

## **Summary**
1. **Introduction**
1. **Goal of the paper**
1. Generation of very high quality images using progressively increasing size of the generator and discriminator.
1. Improved training and stability of GANs.
1. New metric for evaluating GAN results.
1. A high quality version of CELEBAHQ dataset.
1. **Previous Research**
1. Generative methods help to produce new s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs171010196#anirudhnj
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs171010196#anirudhnjFri, 28 Dec 2018 18:33:17 07001810.09136journals/corr/1810.091364Do Deep Generative Models Know What They Don't Know?ameroyerCNNs predictions are known to be very sensitive to adversarial examples, which are samples generated to be wrongly classifiied with high confidence. On the other hand, probabilistic generative models such as `PixelCNN` and `VAEs` learn a distribution over the input domain hence could be used to detect ***outofdistribution inputs***, e.g., by estimating their likelihood under the data distribution. This paper provides interesting results showing that distributions learned by generative models a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.09136#ameroyer
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.09136#ameroyerMon, 17 Dec 2018 10:20:46 07001806.07366journals/corr/1806.073662Neural Ordinary Differential EquationswassnameSummary by senior author [duvenaud on hackernews]().
A few years ago, everyone switched their deep nets to "residual nets". Instead of building deep models like this:
h1 = f1(x)
h2 = f2(h1)
h3 = f3(h2)
h4 = f3(h3)
y = f5(h4)
They now build them like this:
h1 = f1(x) + x
h2 = f2(h1) + h1
h3 = f3(h2) + h2
h4 = f4(h3) + h3
y = f5(h4) + h4
Where f1, f2, etc are neural net layers. The idea is that it's easier to model a small change to an almostcorrec...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07366#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07366#wassnameSun, 16 Dec 2018 04:33:03 07001802.04865journals/corr/1802.048652Learning Confidence for OutofDistribution Detection in Neural Networkselbaro
## Summary
In a prior work 'On Calibration of Modern Nueral Networks', temperature scailing is used for outputing confidence. This is done at inference stage, and does not change the existing classifier. This paper considers the confidence at training stage, and directly outputs the confidence from the network.
## Architecture
An additional branch for confidence is added after the penultimate layer, in parallel to logits and probs (Figure 2).
## Training
The network outputs the prob $p$ and...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.04865#elbaro
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.04865#elbaroMon, 10 Dec 2018 07:30:06 07001706.02690journals/corr/1706.026902Enhancing The Reliability of Outofdistribution Image Detection in Neural Networkselbaro## Task
Add '**rejection**' output to an existing classification model with softmax layer.
## Method
1. Choose some threshold $\delta$ and temperature $T$
2. Add a perturbation to the input x (eq 2),
let $\tilde x = x  \epsilon \text{sign}(\nabla_x \log S_{\hat y}(x;T))$
3. If $p(\tilde x;T)\le \delta$, rejects
4. If not, return the output of the original classifier
$p(\tilde x;T)$ is the max prob with temperature scailing for input $\tilde x$
$\delta$ and $T$ are manually chosen.
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#elbaro
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#elbaroMon, 10 Dec 2018 07:17:15 07001706.04599journals/corr/1706.045992On Calibration of Modern Neural Networkselbaro## Task
A neural network for classification typically has a **softmax** layer and outputs the class with the max probability. However, this probability does not represent the **confidence**. If the average confidence (average of max probs) for a dataset matches the accuracy, it is called **wellcalibrated**. Old models like LeNet (1998) was wellcalibrated, but modern networks like ResNet (2016) are no longer wellcalibrated. This paper explains what caused this and compares various calibration...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.04599#elbaro
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.04599#elbaroMon, 10 Dec 2018 05:52:45 07001811.04551journals/corr/1811.045513Learning Latent Dynamics for Planning from Pixelswassname**Summary**: This paper presents three tricks that make modelbased reinforcement more reliable when tested in tasks that require walking and balancing. The tricks are 1) are planning based on features, 2) using a recursive network that mixes probabilistic and deterministic information, and 3) looking forward multiple steps.
**Longer summary**
Imagine playing pool, armed with a tablet that can predict exactly where the ball will bounce, and the next bounce, and so on. That would be a huge adva...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.04551#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.04551#wassnameSun, 09 Dec 2018 11:50:05 07001807.03146journals/corr/1807.031462Discovery of Latent 3D Keypoints via Endtoend Geometric ReasoningKrishna MurthyWhat the paper is about:
KeypointNet learns the optimal set of 3D keypoints and their 2D detectors for a specified downstream task. The authors demonstrate this by extracting 3D keypoints and their 2D detectors for the task of relative pose estimation across views. They show that, using keypoints extracted by KeypointNet, relative pose estimates are superior to ones that are obtained from a supervised set of keypoints.
Approach:
Training samples for KeypointNet comprise two views (images) of a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.03146#krishnamurthy
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.03146#krishnamurthyThu, 06 Dec 2018 08:04:18 0700conf/nips/GomezRUG173The Reversible Residual Network: Backpropagation Without Storing Activations.ameroyerResidual Networks (ResNets) have greatly advanced the stateoftheart in Deep Learning by making it possible to train much deeper networks via the addition of skip connections. However, in order to compute gradients during the backpropagation pass, all the units' activations have to be stored during the feedforward pass, leading to high memory requirements for these very deep networks.
Instead, the authors propose a **reversible architecture** based on ResNets, in which activations at one l...
http://www.shortscience.org/paper?bibtexKey=conf/nips/GomezRUG17#ameroyer
http://www.shortscience.org/paper?bibtexKey=conf/nips/GomezRUG17#ameroyerWed, 05 Dec 2018 15:14:10 07001712.09913journals/corr/1712.099133Visualizing the Loss Landscape of Neural Netsdaisukelab Presents a simple visualization method based on “filter normalization.”
 Observed that __the deeper networks become, neural loss landscapes become more chaotic__; causes a dramatic drop in generalization error, and ultimately to a lack of trainability.
 Observed that __skip connections promote flat minimizers and prevent the transition to chaotic behavior__; helps explain why skip connections are necessary for training extremely deep networks.
 Quantitatively measures nonconvexity.
 S...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.09913#niz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.09913#nizWed, 05 Dec 2018 13:58:02 07001703.06189journals/corr/1703.061892TURN TAP: Temporal Unit Regression Network for Temporal Action Proposalsshiyu## Temporal unit regression network
keyword: temporal action proposal; computing efficiency
**Summary**: In this paper, Jiyang et al designed a proposal generation and refinement network with high computation efficiency by reusing unit feature on coordinated regression and classification network. Especially, a new metric against temporal proposal called ARF is raised to meet 2 metric criteria: 1. evaluate different method on the same dataset efficiently. 2. capable to evaluate same method'...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.06189#daisy
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.06189#daisyWed, 05 Dec 2018 12:03:51 07001601.02129journals/corr/1601.021292Temporal Action Localization in Untrimmed Videos via Multistage CNNsshiyu## Segmented SNN
**Summary**: this paper use 3stage 3D CNN to identify candidate proposals, recognize actions and localize temporal boundaries.
**Models**:
this network can be mainly divided into 3 parts: generate proposals, select proposal and refine temporal boundaries, and using NMS to remove redundant proposals.
1. generate multiscale(16,32,64,128,256.512) segment using sliding window with 75% overlap. high computing complexity!
2. network: Each stage of the threestage network is using...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1601.02129#daisy
http://www.shortscience.org/paper?bibtexKey=journals/corr/1601.02129#daisyWed, 05 Dec 2018 12:03:16 07001806.02964journals/corr/1806.029642BSN: Boundary Sensitive Network for Temporal Action Proposal Generationshiyu## Boundary sensitive network
### **keyword**: action detection in video; accurate proposal
**Summary**: In order to generate precise temporal boundaries and improve recall with lesses proposals, Tianwei Lin et al use BSN which first combine temporal boundaries with high probability to form proposals and then select proposals by evaluating whether a proposal contains an action(confidence score+ boundary probability).
**Model**:
1. video feature encoding: use the twostream extractor to for...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.02964#daisy
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.02964#daisyTue, 04 Dec 2018 03:11:41 07001807.00392journals/corr/1807.003922Gradient Reversal Against Discriminationameroyer Given some input data $x$ and attribute $a_p$, the task is to predict label $y$ from $x$ while making $a_p$ *protected*, in other words, such that the model predictions are invariant to changes in $a_p$.
* **Pros (+)**: Simple and intuitive idea, easy to train, naturally extended to protecting multiple attributes.
* **Cons ()**: Comparison to baselines could be more detailed / comprehensive, in particular the comparison to ALFR [4] which also relies on adversarial training.

## Pr...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.00392#ameroyer
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.00392#ameroyerMon, 03 Dec 2018 13:03:21 07001511.06984journals/corr/1511.069842Endtoend Learning of Action Detection from Frame Glimpses in Videosshiyu### **Keyword**: RNN, serialized model; nondifferentiable backpropogarion; action detection in video
**Abstract**: This paper uses an endtoend model which is a recurrent neural network trained by REINFORCE to directly predict the temporal bounds of actions. The intuition is that people will observe moments in video and decide where to look to predict when an action is occurring. After training, Serena et al manage to achieve the stateofart result by only observing 2% of the video frames....
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.06984#daisy
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.06984#daisyMon, 03 Dec 2018 07:33:43 07001704.06228journals/corr/1704.062282Temporal Action Detection with Structured Segment Networksshiyu## Structured segmented network
### **key word**: action detection in video; computing complexity reduction; structurize proposal
**Abstract**: using a temporal action grouping scheme (TAG) to generate accurate proposals, using a structured pyramid to model the temporal structure of each action instance to tackle the issue that detected actions are not complete, using two classifiers to determine class and completeness and using a regressor for each category to further modify the temporal bou...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.06228#daisy
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.06228#daisyMon, 03 Dec 2018 07:30:44 07001809.11044journals/corr/1809.110444Relational Forward Models for MultiAgent LearningCodyWildOne of the dominant narratives of the deep learning renaissance has been the value of welldesigned inductive bias  structural choices that shape what a model learns. The biggest example of this can be found in convolutional networks, where models achieve a dramatic parameter reduction by having features maps learn local patterns, which can then be reused across the whole image. This is based on the prior belief that patterns in local images are generally locally contiguous, and so having feat...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.11044#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.11044#decodyngSat, 01 Dec 2018 01:13:47 07001710.08864journals/corr/1710.088643One pixel attack for fooling deep neural networksANIRUDH NJ
## **Keywords**
One pixel attack , adversarial examples , differential evolution , targeted and nontargeted attack

## **Summary**
1. **Introduction **
1. **Basics**
1. Deep learning methods are better than the traditional image processing techniques in most of the cases in computer vision domain.
1. "Adversarial examples" are specifically modified images with imperceptible perturbations that are classified wrong by the network.
1. **Goals of the paper**
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.08864#anirudhnj
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.08864#anirudhnjFri, 30 Nov 2018 14:13:36 07001503.03832journals/corr/1503.038322FaceNet: A Unified Embedding for Face Recognition and ClusteringANIRUDH NJ
## Keywords
Tripletloss , face embedding , harmonic embedding

## Summary
### Introduction
**Goal of the paper**
A unified system is given for face verification , recognition and clustering.
Use of a 128 float pose and illumination invariant feature vector or embedding in the euclidean space.
* Face Verification : Same faces of the person gives feature vectors that have a very close L2 distance between them.
* Face recognition : Face recognition becomes a clustering task in the emb...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1503.03832#anirudhnj
http://www.shortscience.org/paper?bibtexKey=journals/corr/1503.03832#anirudhnjFri, 30 Nov 2018 10:42:36 07001312.6199journals/corr/1312.61993Intriguing properties of neural networksANIRUDH NJ### Keywords
Adversarial example , Perturbations

### Summary
##### Introduction
* Explain two properties of neural network that cause it to misclassify images and cause difficulty to get solid understanding of network.
1. Theoretical understanding of the individual high level unit of a network and a combination of these units or layers.
2. Understanding the continuity of input  output mapping space and the stability of the output wrt. the input.
* Performing a few experiments ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1312.6199#anirudhnj
http://www.shortscience.org/paper?bibtexKey=journals/corr/1312.6199#anirudhnjFri, 30 Nov 2018 10:41:26 07001810.11910journals/corr/1810.119102Learning to Learn without Forgetting By Maximizing Transfer and Minimizing InterferencewassnameCatastrophic forgetting is the tendency of an neural network to forget previously learned information when learning new information. This paper combats that by keeping a buffer of experience and applying metalearning to it. They call their new module Meta Experience Replay or MER.
How does this work? At each update they compute multiple possible updates to the model weights. One for the new batch of information and some more updates for batches of previous experience. Then they apply metalea...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.11910#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.11910#wassnameThu, 29 Nov 2018 22:10:31 07001811.06272journals/corr/1811.062722Woulda, Coulda, Shoulda: CounterfactuallyGuided Policy SearchCodyWildIt is a fact universally acknowledged that a reinforcement learning algorithm not in possession of a model must be in want of more data. Because they generally are. Joking aside, it is broadly understood that modelfree RL takes a lot of data to train, and, even when you can design them to use offpolicy trajectories, collecting data in the real environment might still be too costly. Under those conditions, we might want to learn a model of the environment and generate synthesized trajectories, ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.06272#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.06272#decodyngThu, 29 Nov 2018 07:14:06 07001811.06521journals/corr/1811.065212Reward learning from human preferences and demonstrations in AtariwassnameHow can humans help an agent perform at a task that has no clear reward? Imitation, demonstration, and preferences. This paper asks which combinations of imitation, demonstration, and preferences will best guide an agent in Atari games.
For example an agent that is playing Pong on the Atari, but can't access the score. You might help it by demonstrating your play style for a few hours. To help the agent further you are shown two short clips of it playing and you are asked to indicate which one,...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.06521#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.06521#wassnameThu, 29 Nov 2018 02:08:07 07001711.10485journals/corr/1711.104852AttnGAN: FineGrained Text to Image Generation with Attentional Generative Adversarial NetworksCodyWildThis paper feels a bit like watching a 90’s show, and everyone’s in denim and miniskirts, except it’s a 2017 ML paper, and everything uses attention. (I’ll say it again, ML years are like dog years, but more so). That said, that’s not a critique of the paper: finding clever ways to cobble together techniques for your application can be an important and valuable contribution. This paper addresses the problem of text to image generation: how to take a description of an image and generate...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10485#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10485#decodyngWed, 28 Nov 2018 04:55:26 07001807.03247journals/corr/1807.032473An Intriguing Failing of Convolutional Neural Networks and the CoordConv SolutionCodyWildThis is a paper where I keep being torn between the response of “this is so simple it’s brilliant; why haven’t people done it before,” and “this is so simple it’s almost tautological, and the results I’m seeing aren’t actually that surprising”. The basic observation this paper makes is one made frequently before, most recently to my memory by Geoff Hinton in his Capsule Net paper: sometimes the translation invariance of convolutional networks can be a bad thing, and lead to wor...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.03247#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.03247#decodyngTue, 27 Nov 2018 06:55:14 07001712.09913journals/corr/1712.099134Visualizing the Loss Landscape of Neural NetsCodyWildThis paper was a real delight to read, and even though I’m summarizing it here, I’d really encourage you, if you’re reading this, to read the paper itself, since I found it to be unusually clearly written. It tackles the problem of understanding how features of loss functions  these integral, yet arcane, objects defined in millions of parameterdimensions  impact model performance. Loss function analysis is generally a difficult area, since the number of dimensions and number of points n...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.09913#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.09913#decodyngMon, 26 Nov 2018 07:05:34 07001809.02861journals/corr/1809.028612On the Intriguing Connections of Regularization, Input Gradients and Transferability of Evasion and Poisoning AttacksCodyWildThis paper focuses on the wellknown fact that adversarial examples are often transferable: that is, that an adversarial example created by optimizing loss on a surrogate model trained on similar data can often still induce increased loss on the true target model, though typically not to the same magnitude as an example optimized against the target itself. Its goal is to come up with clearer theoretical formulation for transferred examples, and more clearly understand what kinds of models transf...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.02861#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.02861#decodyngSun, 25 Nov 2018 01:36:26 07001806.11146journals/corr/1806.111463Adversarial Reprogramming of Neural NetworksCodyWildIn the literature of adversarial examples, there’s this (to me) constant question: is it the case that adversarial examples are causing the model to objectively make a mistake, or just displaying behavior that is deeply weird, and unintuitive relative to our sense of what these models “should” be doing. A lot of the former question seems to come down to arguing over about what’s technically “out of distribution”, which has an occasional angelsdancingonapin quality, but it’s pre...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.11146#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.11146#decodyngSat, 24 Nov 2018 05:14:44 07001807.09341journals/corr/1807.093412Learning Plannable Representations with Causal InfoGANCodyWildThis paper tries to solve the problem of how to learn systems that, given a starting state and a desired target, can earn the set of actions necessary to reach that target. The strong version of this problem requires a planning algorithm to learn a full set of actions to take the agent from state A to B. However, this is a difficult and complex task, and so this paper tries to address a relaxed version of this task: generating a set of “waypoint” observations between A and B, such that each ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.09341#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.09341#decodyngFri, 23 Nov 2018 06:49:58 07001810.08647journals/corr/1810.086472Intrinsic Social Motivation via Causal Influence in MultiAgent RLCodyWildThis paper builds very directly on the idea of “empowerment” as an intrinsic reward for RL agents. Where empowerment incentivizes agents to increase the amount of influence they’re able to have over the environment, “social influence,” this paper’s metric, is based on the degree which the actions of one agent influence the actions of other agents, within a multiagent setting. The goals between the two frameworks are a little different. The notion of “empowerment” is built around...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.08647#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.08647#decodyngWed, 21 Nov 2018 05:17:28 07001811.06032journals/corr/1811.060322Natural Environment Benchmarks for Reinforcement LearningwassnameThis paper proposed three new reinforcement learning tasks which involved dealing with images.
 Task 1: An agent crawls across a hidden image, revealing portions of it at each step. It must classify the image in the minimum amount of steps. For example classify the image as a cat after choosing to travel across the ears.
 Task 2: The agent crawls across a visible image to sit on it's target. For example a cat in a scene of pets.
 Task 3: The agent plays an Atari game where the background has...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.06032#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.06032#wassnameWed, 21 Nov 2018 00:18:59 07001703.04908journals/corr/1703.049082Emergence of Grounded Compositional Language in MultiAgent PopulationsCodyWildThis paper performs a fascinating toy experiment, to try to see if something languagelike in structure can be effectively induced in a population of agents, if they are given incentives that promote it. In some sense, a lot of what they find “just makes sense,” but it’s still a useful proof of concept to show that it can be done.
The experiment they run takes place in a simple, twodimensional world, with a fixed number of landmarks (representing locations goals need to take place), and...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.04908#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.04908#decodyngMon, 19 Nov 2018 01:55:00 07001810.12162journals/corr/1810.121624ModelBased Active ExplorationCodyWildThis paper continues in the tradition of curiositybased models, which try to reward models for exploring novel parts of their environment, in the hopes this can intrinsically motivate learning. However, this paper argues that it’s insufficient to just treat novelty as an occasional bonus on top of a normal reward function, and that instead you should figure out a process that’s more specifically designed to increase novelty. Specifically: you should design a policy whose goal is to experien...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.12162#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.12162#decodyngSat, 17 Nov 2018 07:30:01 07001810.02274journals/corr/1810.022742Episodic Curiosity through ReachabilityCodyWildThis paper proposes a new curiositybased intrinsic reward technique that seeks to address one of the failure modes of previous curiosity methods. The basic idea of curiosity is that, often, exploring novel areas of an environment can be correlated with gaining reward within that environment, and that we can find ways to incentivize the former that don’t require a handdesigned reward function. This is appealing because many usefultolearn environments either lack inherent reward altogether, ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.02274#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.02274#decodyngFri, 16 Nov 2018 02:45:07 07001808.04355journals/corr/1808.043554LargeScale Study of CuriosityDriven LearningCodyWildI really enjoyed this paper  in addition to being a clean, fundamentally empirical work, it was also clearly written, and had some pretty delightful moments of quotable zen, which I’ll reference at the end. The paper’s goal is to figure out how far curiositydriven learning alone can take reinforcement learning systems, without the presence of an external reward signal. “Intrinsic” reward learning is when you construct a reward out of internal, inherent features of the environment, rath...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.04355#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.04355#decodyngThu, 15 Nov 2018 05:45:55 07001809.04474journals/corr/1809.044744Multitask Deep Reinforcement Learning with PopArtCodyWildThis paper posits that one of the central problems stopping multitask RL  that is, single models trained to perform multiple tasks well  from reaching better performance, is the inability to balance model resources and capacity between the different tasks the model is being asked to learn. Empirically, prior to this paper, multitask RL could reach ~50% of human accuracy on Atari and Deepmind Lab tasks. The fact that this is lower than human accuracy is actually somewhat less salient than the...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.04474#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.04474#decodyngTue, 13 Nov 2018 08:26:54 07001802.01561journals/corr/1802.015616IMPALA: Scalable Distributed DeepRL with Importance Weighted ActorLearner ArchitecturesCodyWildThis reinforcement learning paper starts with the constraints imposed an engineering problem  the need to scale up learning problems to operate across many GPUs  and ended up, as a result, needing to solve an algorithmic problem along with it.
In order to massively scale up their training to be able to train multiple problem domains in a single model, the authors of this paper implemented a system whereby many “worker” nodes execute trajectories (series of actions, states, and reward) an...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01561#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01561#decodyngMon, 12 Nov 2018 08:19:15 07001811.02549journals/corr/1811.025496Language GANs Falling ShortCodyWildThis paper’s highlevel goal is to evaluate how well GANtype structures for generating text are performing, compared to more traditional maximum likelihood methods. In the process, it zooms into the ways that the current set of metrics for comparing text generation fail to give a wellrounded picture of how models are performing.
In the old paradigm, of maximum likelihood estimation, models were both trained and evaluated on a maximizing the likelihood of each word, given the prior words in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.02549#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.02549#decodyngSat, 10 Nov 2018 08:20:21 07001609.05473journals/corr/1609.054732SeqGAN: Sequence Generative Adversarial Nets with Policy GradientCodyWildGANs for images have made impressive progress in recent years, reaching everhigher levels of subjective realism. It’s also interesting to think about domains where the GAN architecture is less of a good fit. An example of one such domain is natural language.
As opposed to images, which are made of continuous pixel values, sentences are fundamentally sequences of discrete values: that is, words. In a GAN, when the discriminator makes its assessment of the realness of the image, the gradient ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1609.05473#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1609.05473#decodyngFri, 09 Nov 2018 05:29:01 07001811.01778journals/corr/1811.017784On the Evaluation of CommonSense Reasoning in Natural Language UnderstandingCodyWildI should say from the outset: I have a lot of fondness for this paper. It goes upstream of a lot of researchcommunity incentives: It’s not methodologically flashy, it’s not about beating the State of the Art with a bigger, better model (though, those papers certainly also have their place). The goal of this paper was, instead, to dive into a test set used to evaluate performance of models, and try to understand to what extent it’s really providing a rigorous test of what we want out of mo...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.01778#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.01778#decodyngWed, 07 Nov 2018 04:56:41 07001810.06682journals/corr/1810.066823Trellis Networks for Sequence ModelingCodyWildFor solving sequence modeling problems, recurrent architectures have been historically the most commonly used solution, but, recently, temporal convolution networks, especially with dilations to help capture longer term dependencies, have gained prominence. RNNs have theoretically much larger capacity to learn long sequences, but also have a lot of difficulty propagating signal forward through long chains of recurrent operations. This paper, which suggests the approach of Trellis Networks, place...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.06682#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.06682#decodyngMon, 05 Nov 2018 07:46:39 07001808.04891journals/corr/1808.048912Embedding GrammarsCodyWildThis paper is, on the whole, a refreshing jaunt into the applied side of the research word. It isn’t looking to solve a fundamental machine learning problem in some new way, but it does highlight and explore one potential beneficial application of a common and widely used technique: specifically, combining word embeddings with contextfree grammars (such as: regular expressions), to make the latter less rigid.
Regular expressions work by specifying specific hardcoded patterns of symbols, and...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.04891#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.04891#decodyngSun, 04 Nov 2018 06:52:09 07001810.13409journals/corr/1810.134094You May Not Need AttentionOfir PressAn attention mechanism and a separate encoder/decoder are two properties of almost every single neural translation model. The question asked in this paper is how far can we go without attention and without a separate encoder and decoder? And the answer is pretty far! The model presented preforms just as well as the attention model of Bahdanau on the four language directions that are studied in the paper.
The translation model presented in the paper is basically a simple recurrent language mod...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.13409#ofirpress
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.13409#ofirpressSat, 03 Nov 2018 09:31:24 06001810.04805journals/corr/1810.048056BERT: Pretraining of Deep Bidirectional Transformers for Language UnderstandingCodyWildThe last two years have seen a number of improvements in the field of language model pretraining, and BERT  Bidirectional Encoder Representations from Transformers  is the most recent entry into this canon. The general problem posed by language model pretraining is: can we leverage huge amounts of raw text, which aren’t labeled for any specific classification task, to help us train better models for supervised language tasks (like translation, question answering, logical entailment, etc)? Me...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.04805#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.04805#decodyngFri, 02 Nov 2018 06:43:01 06001810.06721journals/corr/1810.067213Optimizing Agent Behavior over Long Time Scales by Transporting ValuewassnameThis builds on the previous ["MERLIN"]() paper. First they introduce the RMA agent, which is a simplified version of MERLIN which uses model based RL and long term memory. They give the agent long term memory by letting it choose to save and load the agent's working memory (represented by the LSTM's hidden state).
Then they add credit assignment, similar to the RUDDER paper, to get the "Temporal Value Transport" (TVT) agent that can plan long term in the face of distractions. **The critical in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.06721#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.06721#wassnameFri, 02 Nov 2018 01:45:04 060010.1101/2256642Prioritized memory access explains planning and hippocampal replaywassname**TL;DR:** There are 'place cells' in the hippopotamus that are fired when passing through a location. You can take a rat and measure how its cells are activated in a maze, then monitor neurons during planning, rest or sleep. You'll see patterns that show it's thinking of locations in order and focusing on interesting locations. This paper looks at how RL agents do 'prioritized experience replay' and compare it to place cells in animals. The authors do a RL simulation and *qualitatively* compare...
http://www.shortscience.org/paper?bibtexKey=10.1101/225664#wassname
http://www.shortscience.org/paper?bibtexKey=10.1101/225664#wassnameSun, 28 Oct 2018 04:05:27 06001806.07857journals/corr/1806.078573RUDDER: Return Decomposition for Delayed Rewardswassname[Summary by author /u/SirJAM_armedi]().
Math aside, the "big idea" of RUDDER is the following: We use an LSTM to predict the return of an episode. To do this, the LSTM will have to recognize what actually causes the reward (e.g. "shooting the gun in the right direction causes the reward, even if we get the reward only once the bullet hits the enemy after travelling along the screen"). We then use a salience method (e.g. LRP or integrated gradients) to get that information out of the LSTM, and r...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07857#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07857#wassnameSun, 28 Oct 2018 04:05:08 06001810.02334journals/corr/1810.023344Unsupervised Learning via MetaLearningCodyWildThis recent paper, a collaboration involving some of the authors of MAML, proposes an intriguing application of techniques developed in the field of meta learning to the problem of unsupervised learning  specifically, the problem of developing representations without labeled data, which can then be used to learn quickly from a small amount of labeled data. As a reminder, the idea behind meta learning is that you train models on multiple different tasks, using only a small amount of data from ea...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.02334#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.02334#decodyngSat, 13 Oct 2018 03:41:42 060010.1109/83.9022912Active contours without edgesAnmol SharmaTypically, the energy minimization or snakes based object detection frameworks evolve a parametrized curve guided by some form of image gradient information. However due to heavy reliance on gradients, the approaches tend to fail in scenarios where this information is misleading or unavailable. This cripples the snake and renders it unusable as it gets stuck in a localminima away from the actual object. Moreover, the parametrized snake lacks the ability to model multiple evolving curves in a si...
http://www.shortscience.org/paper?bibtexKey=10.1109/83.902291#anmolsharma
http://www.shortscience.org/paper?bibtexKey=10.1109/83.902291#anmolsharmaWed, 10 Oct 2018 20:38:38 060010.1007/bf001335702Snakes: Active contour modelsAnmol SharmaLow level tasks such as edge, contour and line detection are an essential precursor to any downstream image analysis processes. However, most of the approaches targeting these problems work as isolated and autonomous entities, without using any highlevel image information such as context, global shapes, or userlevel input. This leads to errors that can further propagate through the pipeline without providing an opportunity for future correction. In order to address this problem, Kass et al. in...
http://www.shortscience.org/paper?bibtexKey=10.1007/bf00133570#anmolsharma
http://www.shortscience.org/paper?bibtexKey=10.1007/bf00133570#anmolsharmaWed, 10 Oct 2018 20:18:42 06001806.00340journals/corr/1806.003402Producing radiologistquality reports for interpretable artificial intelligenceTess BerthierThe paper presents a modelagnostic extension of deep learning classifiers based on a RNN with a visual attention mechanism for report generation.
![]()
One of the most important points in this paper is not the model, but the dataset they itself: Luke OakdenRayner, one of the authors, is a radiologist and worked a lot to educate the public on current medical datasets ([chest xray blog post]()), how they are made and what are the problems associated with them. In this paper they used 50,363 f...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.00340#tessberthier
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.00340#tessberthierWed, 03 Oct 2018 20:51:21 06001512.03385journals/corr/HeZRS152Deep Residual Learning for Image RecognitionEddie SmolanskySources:


Summary:
 Took the first place in Imagenet 5 main tracks
 Revolution of depth: GoogLeNet was 22 layers with 6.7 top5 error,
Resnet is 152 layers with 3.57 top5 error
 Light on complexity: the 34 layer baseline is 18% of the FLOPs(multiplyadds) of VGG.
 Resnet 152 has lower time complexity than VGG16/19
 Extends well to detection and segmentation tasks
 Just stacking more layers gives worse performance. Why? In theory:
> A deeper model should not have
higher...
http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15#eddiesmolansky
http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15#eddiesmolanskySun, 23 Sep 2018 20:47:58 06001711.07618journals/corr/1711.076182$S^4$Net: Single Stage SalientInstance SegmentationEddie SmolanskyIt's like mask rcnn but for salient instances.
code will be available at .
They invented a layer "mask pooling" that they claim is better than ROI pooling and ROI align.
>As can be seen, our proposed
binary RoIMasking and ternary RoIMasking both outperform
RoIPool and RoIAlign in mAP0.7
. Specifically, our
ternary RoIMasking result improves the RoIAlign result by
around 2.5 points. This reflects that considering more context
information outside the proposals does help for salient
instance seg...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.07618#eddiesmolansky
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.07618#eddiesmolanskySun, 23 Sep 2018 20:39:52 06001705.07426journals/corr/1705.074262The Do's and Don'ts for CNNbased Face VerificationEddie Smolansky# Metadata
* **Title**: The Do’s and Don’ts for CNNbased Face Verification
* **Authors**: Ankan Bansal Carlos Castillo Rajeev Ranjan Rama Chellappa
UMIACS 
University of Maryland, College Park
* **Link**:
# Abstract
>Convolutional neural networks (CNN) have become the most sought after tools for addressing object recognition problems. Specifically, they have produced stateofthe art results for unconstrained face recognition and verification tasks. While the research community appears ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07426#eddiesmolansky
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07426#eddiesmolanskySun, 23 Sep 2018 20:34:23 060010.21105/joss.006762OPEM : Open Source PEM Cell Simulation ToolSepand HaghighiModeling and simulation of protonexchange membrane fuel cells (PEMFC) may work as a powerful tool in the Research & development of renewable energy sources. The OpenSource PEMFC Simulation Tool (OPEM) is a modeling tool for evaluating the performance of proton exchange membrane fuel cells. This package is a combination of models (static/dynamic) that predict the optimum operating parameters of PEMFC. OPEM contained generic models that will accept as input, not only values of the operating vari...
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00676#sepandhaghighi
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00676#sepandhaghighiSat, 08 Sep 2018 10:00:26 06001808.07371journals/corr/1808.073715Everybody Dance NowOleksandr BailoThis paper presents a perframe imagetoimage translation system enabling copying of a motion of a person from a source video to a target person. For example, a source video might be a professional dancer performing complicated moves, while the target person is you. By utilizing this approach, it is possible to generate a video of you dancing as a professional. Check the authors' [video]() for the visual explanation.
**Data preparation**
The authors have manually recorded highresolution vide...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.07371#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.07371#ukrdailoWed, 05 Sep 2018 07:15:05 06001804.02341journals/corr/1804.023412Compositional Obverter Communication Learning From Raw Visual InputBen BoginThis paper proposes a new training method for multiagent communication settings. They show the following referential game: A speaker sees an image of a 3d rendered object and describes it to a listener. The listener sees a different image and must decide if it is the same object as described by the speaker (has the same color and shape). The game can only be completed successfully if a communication protocol emerges that can express the color and shape the speaker sees.
The main contribution o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02341#benbogin
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02341#benboginSun, 02 Sep 2018 21:04:11 060010.21105/joss.007292PyCM: Multiclass confusion matrix library in PythonSepand HaghighiPyCM is a multiclass confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for postclassification model evaluation that supports most classes and overall statistics parameters. PyCM is the swissarmy knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers.
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00729#sepandhaghighi
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00729#sepandhaghighiSat, 01 Sep 2018 22:20:36 060010.1111/cdep.122822From Babies to Robots: The Contribution of Developmental Robotics to Developmental PsychologyNatalia Diaz Rodriguez, PhDJoint summary from
Developmental robotics is the interdisciplinary approach to the autonomous design of behavioural and cognitive capabilities in artificial agents (robots) that takes direct inspiration from the developmental principles and mechanisms observed in the natural cognitive systems. It relies on a highly interdisciplinary effort of empirical developmental sciences such as developmental psychology, neuroscience, and comparative psychology, and computational and engineering disciplin...
http://www.shortscience.org/paper?bibtexKey=10.1111/cdep.12282#natalia
http://www.shortscience.org/paper?bibtexKey=10.1111/cdep.12282#nataliaThu, 23 Aug 2018 09:55:44 06001709.04326journals/corr/1709.043263Learning with OpponentLearning AwarenessmnoukhovNormal RL agents in multiagent scenarios treat their opponents as a static part of the environment, not taking into account the fact that other agents are learning as well. This paper proposes LOLA, a learning rule that should take the agency and learning of opponents into account by optimizing "return under one step lookahead of opponent learning"
So instead of optimizing under the current parameters of agent 1 and 2
$$V^1(\theta_i^1, \theta_i^2)$$
LOLA proposes to optimize taking into acc...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#mnoukhov
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#mnoukhovMon, 13 Aug 2018 23:01:16 06001805.09733journals/corr/1805.097333Towards Robust Evaluations of Continual LearningNatalia Diaz Rodriguez, PhDThrough a likelihoodfocused derivation of a variational inference (VI) loss, Variational Generative Experience Replay (VGER) presents the closest appropriate likelihood focused alternative to Variational Continual Learning (VCL), the stateof the art priorfocused approach to continual learning.
In non continual learning, the aim is to learn parameters $\omega$ using labelled training data $\mathcal{D}$ to infer $p(y\omega, x)$. In the continual learning context, instead, the data is not in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09733#natalia
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09733#nataliaFri, 10 Aug 2018 11:27:55 06001806.06621journals/corr/1806.066212Banach Wasserstein GANArtëm SobolevThe paper extends the [WGAN]() paper by replacing the L2 norm in the transportation cost by some other metric $d(x, y)$. By following the same reasoning as in the WGAN paper one arrives at a dual optimization problem similar to the WGAN's one except that the critic $f$ has to be 1Lipschitz w.r.t. a given norm (rather than L2). This, in turn, means that critic's gradient (w.r.t. input $x$) has to be bounded in the dual norm (only in Banach spaces, hence the name). Authors build upon the [WGANGP...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.06621#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.06621#artemsTue, 31 Jul 2018 10:05:24 0600conf/naacl/JagannathaY162Bidirectional RNN for Medical Event Detection in Electronic Health RecordsJoseph Paul CohenThe basic approach is an RNN applied to text to predict a medical event such as an ICD code. It is unclear if the complicated BiRNN model is required.
This has some useful applications such as
 Adapt old databases
 Correct errors
 Upgrade ICD versions
A simple diagram of an RNN applied to medical next is shown below:
http://www.shortscience.org/paper?bibtexKey=conf/naacl/JagannathaY16#joecohen
http://www.shortscience.org/paper?bibtexKey=conf/naacl/JagannathaY16#joecohenSat, 28 Jul 2018 18:13:55 06001602.05568journals/corr/1602.055682Multilayer Representation Learning for Medical ConceptsJoseph Paul CohenThis model called Med2Vec is inspired by Word2Vec. It is Word2Vec for time series patient visits with ICD codes. The model learns embeddings for medical codes as well as the demographics of patients.
The context is temporal. For each $x_t$ as input the model predicts $x_{t+1}$ and $x_{t1}$ or more depending on the temporal window size.
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.05568#joecohen
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.05568#joecohenSat, 28 Jul 2018 18:09:03 06001802.00400journals/corr/1802.004002A Comparison of Word Embeddings for the Biomedical Natural Language ProcessingJoseph Paul CohenThis paper demonstrates that Word2Vec \cite{1301.3781} can extract relationships between words and produce latent representations useful for medical data. They explore this model on different datasets which yield different relationships between words.
The Word2Vec model works like an autoencoder that predicts the context of a word. The context of a word is composed of the surrounding words as shown below. Given the word in the center the neighboring words are predicted through a bottleneck in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.00400#joecohen
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.00400#joecohenSat, 28 Jul 2018 17:33:08 060010.1007/9783319917047_113An Experimental Evaluation of the Generalizing Capabilities of Process Discovery Techniques and BlackBox Sequence ModelsNiek Tax# Contributions
The contribution of this paper is threefold:
1. We present a method to use *process models* as interpretable sequence models that have a stronger notion of interpretability than what is generally used in the machine learning field (see Section *process models* below),
2. We show that this approach enables the comparison of traditional sequence models (RNNs, LSTMs, Markov Models) with techniques from the research field of *automated process discovery*,
3. We show on a collection ...
http://www.shortscience.org/paper?bibtexKey=10.1007/9783319917047_11#niektax
http://www.shortscience.org/paper?bibtexKey=10.1007/9783319917047_11#niektaxWed, 25 Jul 2018 08:13:25 06001505.05770journals/corr/1505.057703Variational Inference with Normalizing FlowsCodyWildThis paper argues for the use of normalizing flows  a way of building up new probability distributions by applying multiple sets of invertible transformations to existing distributions  as a way of building more flexible variational inference models.
The central premise of a variational autoencoder is that of learning an approximation to the posterior distribution of latent variables  p(zx)  and parameterizing that distribution according to values produced by a neural network. In typical ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1505.05770#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1505.05770#decodyngMon, 23 Jul 2018 15:34:55 06001711.09081journals/corr/1711.090813Deep Extreme Cut: From Extreme Points to Object SegmentationOleksandr BailoThis paper introduces a CNN based segmentation of an object that is defined by a user using four extreme points (i.e. bounding box). Interestingly, in a related work, it has been shown that clicking extreme points is about 5 times more efficient than drawing a bounding box in terms of speed.
The extreme points have several goals in this work. First, they are used as a bounding box to crop the object of interest. Secondly, they are utilized to create a heatmap with activations in the regions o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09081#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09081#ukrdailoMon, 23 Jul 2018 02:25:26 06001803.09693journals/corr/1803.096933Efficient Interactive Annotation of Segmentation Datasets with PolygonRNN++Oleksandr BailoIn this paper, the authors develop a system for automatic as well as an interactive annotation (i.e. segmentation) of a dataset. In the automatic mode, bounding boxes are generated by another network (e.g. FasterRCNN), while in the interactive mode, the input bounding box around an object of interest comes from the human in the loop.
The system is composed of the following parts:
1. **Residual encoder with skip connections**. This step acts as a feature extractor. The ResNet50 with few modifi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09693#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09693#ukrdailoSun, 22 Jul 2018 07:51:07 06001806.10474journals/corr/1806.104742The challenge of realistic music generation: modelling raw audio at scaleCodyWildThis paper draws from two strains of recent work: the hierarchical music modeling of MusicVAE  which intentionally model musical structure at both local and more global levels  , and the discrete autoencoder approaches of Vector Quantized VAEs  which seek to maintain the overall structure of a VAE, but apply a less aggressive form of regularization.
The goal of this paper is to build a model that can generate music, not from that music’s symbolic representation  lists of notes  but from ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.10474#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.10474#decodyngSun, 22 Jul 2018 05:48:19 06001807.01604journals/corr/1807.016042QuasiMonte Carlo Variational InferenceArtëm SobolevVariational Inference builds around the ELBO (Evidence Lower BOund)  a lower bound on a marginal loglikelihood of the observed data $\log p(x) = \log \int p(x, z) dz$ (which is typically intractable). The ELBO makes use of an approximate posterior to form a lower bound:
$$
\log p(x) \ge \mathbb{E}_{q(zx)} \log \frac{p(x, z)}{q(zx)}
$$
# Introduction to Quasi Monte Carlo
It's assumed that both the join $p(x, z)$ (or, equivalently the likelihood $p(xz)$ and the prior $p(z)$) and the appro...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.01604#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.01604#artemsFri, 20 Jul 2018 11:01:35 06001709.04326journals/corr/1709.043264Learning with OpponentLearning AwarenessCodyWildA central question of this paper is: under what circumstances will you see agents that have been trained to optimize their own reward implement strategies  like tit for tat  that are are more sophisticated and higher overall reward than each agent simply pursuing its dominant strategy. The games under consideration here are “general sum” games like Iterated Prisoner’s Dilemma, where each agent’s dominant strategy is to defect, but with some amount of coordination or reciprocity, better...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#decodyngThu, 19 Jul 2018 16:55:33 06001806.05759journals/corr/1806.057592Insights on representational similarity in neural networks with canonical correlationCodyWildThe overall goal of the paper is measure how similar different layer activation profiles are to one another, in hopes of being able to quantify the similarity of the representations that different layers are learning. If you had a measure that captured this, you could ask questions like: “how similar are the representations that are learned by different networks on the same task”, and “what is the dynamic of representational change in a given layer throughout training”?
Canonical Corre...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.05759#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.05759#decodyngTue, 17 Jul 2018 23:18:12 06001802.07535journals/corr/1802.075353BRUNO: A Deep Recurrent Model for Exchangeable DataArtëm SobolevIf one is a Bayesian he or she best expresses beliefs about next observation $x_{n+1}$ after observing $x_1, \dots, x_n$ using the **posterior predictive distribution**: $p(x_{n+1}\vert x_1, \dots, x_n)$. Typically one invokes the de Finetti theorem and assumes there exists an underlying model $p(x\vert\theta)$, hence $p(x_{n+1}\vert x_1, \dots, x_n) = \int p(x_{n+1} \vert \theta) p(\theta \vert x_1, \dots, x_n) d\theta$, however this integral is far from tractable in most cases. Nevertheless, h...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.07535#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.07535#artemsMon, 09 Jul 2018 17:46:37 06001712.01238journals/corr/1712.012384Learning by Asking QuestionsOleksandr BailoThis paper is about interactive Visual Question Answering (VQA) setting in which agents must ask questions about images to learn. This closely mimics how people learn from each other using natural language and has a strong potential to learn much faster with fewer data. It is referred as learning by asking (LBA) through the paper. The approach is composed of three models:
1. **Question proposal module** is responsible for generating _important_ questions about the image. It is a combination of...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.01238#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.01238#ukrdailoSun, 08 Jul 2018 12:32:56 06001803.07485journals/corr/1803.074852Actor and Action Video Segmentation from a SentenceOleksandr BailoThis paper performs pixelwise segmentation of the object of interest which is specified by a sentence. The model is composed of three main components: a **textual encoder**, a **video encoder**, and a **decoder**.
 **Textual encoder** is word2vec pretrained model followed by 1D CNN.
 **Video encoder** is a 3D CNN to obtain a visual representation of the video (can be combined with optical flow to obtain motion information).
 **Decoder**. Given a sentence representation $T$ a separate filt...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.07485#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.07485#ukrdailoWed, 04 Jul 2018 05:47:29 06001711.11543journals/corr/1711.115433Embodied Question AnsweringOleksandr BailoThis paper introduces a new AI task  Embodied Question Answering. The goal of this task for an agent is to be able to answer the question by observing the environment through a single egocentric RGB camera while being able to navigate inside the environment. The agent has 4 natural modules:
1. **Vision**. 224x224 RGB images are processed by CNN to produce a fixedsize representation. This CNN is pretrained on pixeltopixel tasks such as RGB reconstruction, semantic segmentation, and depth est...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.11543#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.11543#ukrdailoWed, 04 Jul 2018 02:12:50 060010.18653/v1/p1610782TreetoSequence Attentional Neural Machine TranslationTim MillerThis work extends sequencetosequence models for machine translation by using syntactic information on the source language side. This paper looks at the translation task where English is the source language, and Japanese is the target language. The dataset is the ASPEC corpus of scientific paper abstracts that seem to be in both English and Japanese? (See note below). The trees for the source (English) are generated by running the ENJU parser on the English data, resulting in binary trees, and ...
http://www.shortscience.org/paper?bibtexKey=10.18653/v1/p161078#tmills
http://www.shortscience.org/paper?bibtexKey=10.18653/v1/p161078#tmillsTue, 03 Jul 2018 15:43:38 06001804.08328journals/corr/1804.083285Taskonomy: Disentangling Task Transfer LearningOleksandr BailoThe goal of this work is to perform transfer learning among numerous tasks and to discover visual relationships among them. Specifically, while we intiutively might guess the depth of an image and surface normals are related, this work takes a step forward and discovers a beneficial relationship among 26 tasks in terms of task transferability  many of them are not obvious. This is important for scenarios when an insufficient budget is available for target task for annotation, thus, learned repr...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.08328#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.08328#ukrdailoMon, 02 Jul 2018 02:46:39 06001702.02284journals/corr/1702.022842Adversarial Attacks on Neural Network PoliciesDavid StutzHuang et al. study adversarial attacks on reinforcement learning policies. One of the main problems, in contrast to supervised learning, is that there might not be a reward in any time step, meaning there is no clear objective to use. However, this is essential when crafting adversarial examples as they are mostly based on maximizing the training loss. To avoid this problem, Huang et al. assume a welltrained policy; the policy is expected to output a distribution over actions. Then, adversarial...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.02284#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.02284#davidstutzThu, 28 Jun 2018 19:16:01 06001712.03141journals/corr/1712.031412Wild Patterns: Ten Years After the Rise of Adversarial Machine LearningDavid StutzBiggio and Roli provide a comprehensive survey and discussion of work in adversarial machine learning. In contrast to related work [1,2], they explicitly discuss the relation of recent developments regarding the security of deep neural networks (as primarily discussed in [1] and [2]) and adversarial machine learning in general. The latter can be traced back to early work starting in 2004, e.g. involving adversarial attacks on spam filters. As a result, terminology used by Biggio and Roli is slig...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.03141#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.03141#davidstutzThu, 28 Jun 2018 19:11:16 06001801.00553journals/corr/1801.005532Threat of Adversarial Attacks on Deep Learning in Computer Vision: A SurveyDavid StutzAkhtar and Mian present a comprehensive survey of attacks and defenses of deep neural networks, specifically in computer vision. Published on ArXiv in January 2018, but probably written prior to August 2017, the survey includes recent attacks and defenses. For example, Table 1 presents an overview of attacks on deep neural networks – categorized by knowledge, target and perturbation measure. The authors also provide a strength measure – in the form of a 15 start “rating”. Personally, ho...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00553#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00553#davidstutzThu, 28 Jun 2018 19:06:48 06001712.07107journals/corr/1712.071072Adversarial Examples: Attacks and Defenses for Deep LearningDavid StutzYuan et al. present a comprehensive survey of attacks, defenses and studies regarding the robustness and security of deep neural networks. Published on ArXiv in December 2017, it includes most recent attacks and defenses. For examples, Table 1 lists all known attacks – Yuan et al. categorize the attacks according to the level of knowledge needed, targeted or nontargeted, the optimization needed (e.g. iterative) as well as the perturbation measure employed. As a result, Table 1 gives a solid o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.07107#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.07107#davidstutzThu, 28 Jun 2018 18:59:29 06001605.01775journals/corr/1605.017752Adversarial Diversity and Hard Positive GenerationDavid StutzRozsa et al. propose PASS, an perceptual similarity metric invariant to homographies to quantify adversarial perturbations. In particular, PASS is based on the structural similarity metric SSIM [1]; specifically
$PASS(\tilde{x}, x) = SSIM(\psi(\tilde{x},x), x)$
where $\psi(\tilde{x}, x)$ transforms the perturbed image $\tilde{x}$ to the image $x$ by applying a homography $H$ (which can be found through optimization). Based on this similarity metric, they consider additional attacks which creat...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.01775#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.01775#davidstutzThu, 28 Jun 2018 18:32:44 06001605.07262journals/corr/1605.072622Measuring Neural Net Robustness with ConstraintsDavid StutzBastani et al. propose formal robustness measures and an algorithm for approximating them for piecewise linear networks. Specifically, the notion of robustness is similar to related work:
$\rho(f,x) = \inf\{\epsilon \geq 0  f \text{ is not } (x,\epsilon)\text{robust}$
where $(x,\epsilon)$robustness demands that for every $x'$ with $\x'x\_\infty$ it holds that $f(x') = f(x)$ – in other words, the label does not change for perturbations $\eta = x'x$ which are small in terms of the $L_\...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.07262#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.07262#davidstutzThu, 28 Jun 2018 18:23:07 06001711.10925journals/corr/1711.109254Deep Image PriorDavid StutzUlyanov et al. utilize untrained neural networks as regularizer/prior for various image restoration tasks such as denoising, inpainting and superresolution. In particualr, the standard formulation of such tasks, i.e.
$x^\ast = \arg\min_x E(x, x_0) + R(x)$
where $x_0$ is the input image and $E$ a taskdependent data term, is rephrased as follows:
$\theta^\ast = \arg\min_\theta E(f_\theta(z); x_0)$ and $x^\ast = f_{\theta^\ast}(z)$
for a fixed but random $z$. Here, the regularizer $R$ is esse...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10925#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10925#davidstutzThu, 28 Jun 2018 18:14:51 06001801.02774journals/corr/1801.027742Adversarial SpheresDavid StutzGilmer et al. study the existence of adversarial examples on a synthetic toy datasets consisting of two concentric spheres. The dataset is created by randomly sampling examples from two concentric spheres, one with radius $1$ and one with radius $R = 1.3$. While the authors argue that difference difficulties of the dataset can be created by varying $R$ and the dimensionality, they merely experiment with $R = 1.3$ and a dimensionality of $500$. The motivation to study this dataset comes form the ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02774#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02774#davidstutzThu, 28 Jun 2018 18:02:30 06001608.08967journals/corr/1608.089673Robustness of classifiers: from adversarial to random noiseDavid StutzFawzi et al. study robustness in the transition from random samples to semirandom and adversarial samples. Specifically they present bounds relating the norm of an adversarial perturbation to the norm of random perturbations – for the exact form I refer to the paper. Personally, I find the definition of semirandom noise most interesting, as it allows to get an intuition for distinguishing random noise from adversarial examples. As in related literature, adversarial examples are defined as
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.08967#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.08967#davidstutzThu, 28 Jun 2018 17:54:18 06001608.07690journals/corr/1608.076903A Boundary Tilting Persepective on the Phenomenon of Adversarial ExamplesDavid StutzTanay and Griffin introduce the boundary tilting perspective as alternative to the “linear explanation” for adversarial examples. Specifically, they argue that it is not reasonable to assume that the linearity in deep neural networks causes the existence of adversarial examples. Originally, Goodfellow et al. [1] explained the impact of adversarial examples by considering a linear classifier:
$w^T x' = w^Tx + w^T\eta$
where $\eta$ is the adversarial perturbations. In large dimensions, the s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.07690#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.07690#davidstutzThu, 28 Jun 2018 17:50:32 06001801.09344journals/corr/1801.093443Certified Defenses against Adversarial ExamplesDavid StutzRaghunathan et al. provide an upper bound on the adversarial loss of twolayer networks and also derive a regularization method to minimize this upper bound. In particular, the authors consider the scoring functions $f^i(x) = V_i^T\sigma(Wx)$ with bounded derivative $\sigma'(z) \in [0,1]$ which holds for Sigmoid and ReLU activation functions. Still, the model is very constrained considering recent, wellperformng deep (convolutional) neural networks. The upper bound is then derived by considerin...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.09344#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.09344#davidstutzThu, 28 Jun 2018 17:41:17 0600conf/icml/CisseBGDU173Parseval Networks: Improving Robustness to Adversarial ExamplesDavid StutzCisse et al. propose parseval networks, deep neural networks regularized to learn orthonormal weight matrices. Similar to the work by Hein et al. [1], the mean idea is to constrain the Lipschitz constant of the network – which essentially means constraining the Lipschitz constants of each layer independently. For weight matrices, this can be achieved by constraining the matrixnorm. However, this (depending on the norm used) is often intractable during gradient descent training. Therefore, Cis...
http://www.shortscience.org/paper?bibtexKey=conf/icml/CisseBGDU17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/icml/CisseBGDU17#davidstutzThu, 28 Jun 2018 17:33:43 0600