ShortScience.org Latest SummariesShortScience.org Latest Summaries
http://www.shortscience.org/
60Sun, 18 Nov 2018 21:31:02 07001810.12162journals/corr/1810.121622ModelBased Active ExplorationCodyWildThis paper continues in the tradition of curiositybased models, which try to reward models for exploring novel parts of their environment, in the hopes this can intrinsically motivate learning. However, this paper argues that it’s insufficient to just treat novelty as an occasional bonus on top of a normal reward function, and that instead you should figure out a process that’s more specifically designed to increase novelty. Specifically: you should design a policy whose goal is to experien...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.12162#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.12162#decodyngSat, 17 Nov 2018 07:30:01 07001810.02274journals/corr/1810.022742Episodic Curiosity through ReachabilityCodyWildThis paper proposes a new curiositybased intrinsic reward technique that seeks to address one of the failure modes of previous curiosity methods. The basic idea of curiosity is that, often, exploring novel areas of an environment can be correlated with gaining reward within that environment, and that we can find ways to incentivize the former that don’t require a handdesigned reward function. This is appealing because many usefultolearn environments either lack inherent reward altogether, ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.02274#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.02274#decodyngFri, 16 Nov 2018 02:45:07 07001808.04355journals/corr/1808.043552LargeScale Study of CuriosityDriven LearningCodyWildI really enjoyed this paper  in addition to being a clean, fundamentally empirical work, it was also clearly written, and had some pretty delightful moments of quotable zen, which I’ll reference at the end. The paper’s goal is to figure out how far curiositydriven learning alone can take reinforcement learning systems, without the presence of an external reward signal. “Intrinsic” reward learning is when you construct a reward out of internal, inherent features of the environment, rath...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.04355#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.04355#decodyngThu, 15 Nov 2018 05:45:55 07001809.04474journals/corr/1809.044744Multitask Deep Reinforcement Learning with PopArtCodyWildThis paper posits that one of the central problems stopping multitask RL  that is, single models trained to perform multiple tasks well  from reaching better performance, is the inability to balance model resources and capacity between the different tasks the model is being asked to learn. Empirically, prior to this paper, multitask RL could reach ~50% of human accuracy on Atari and Deepmind Lab tasks. The fact that this is lower than human accuracy is actually somewhat less salient than the...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.04474#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.04474#decodyngTue, 13 Nov 2018 08:26:54 07001802.01561journals/corr/1802.015615IMPALA: Scalable Distributed DeepRL with Importance Weighted ActorLearner ArchitecturesCodyWildThis reinforcement learning paper starts with the constraints imposed an engineering problem  the need to scale up learning problems to operate across many GPUs  and ended up, as a result, needing to solve an algorithmic problem along with it.
In order to massively scale up their training to be able to train multiple problem domains in a single model, the authors of this paper implemented a system whereby many “worker” nodes execute trajectories (series of actions, states, and reward) an...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01561#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01561#decodyngMon, 12 Nov 2018 08:19:15 07001811.02549journals/corr/1811.025493Language GANs Falling ShortCodyWildThis paper’s highlevel goal is to evaluate how well GANtype structures for generating text are performing, compared to more traditional maximum likelihood methods. In the process, it zooms into the ways that the current set of metrics for comparing text generation fail to give a wellrounded picture of how models are performing.
In the old paradigm, of maximum likelihood estimation, models were both trained and evaluated on a maximizing the likelihood of each word, given the prior words in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.02549#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.02549#decodyngSat, 10 Nov 2018 08:20:21 07001609.05473journals/corr/1609.054732SeqGAN: Sequence Generative Adversarial Nets with Policy GradientCodyWildGANs for images have made impressive progress in recent years, reaching everhigher levels of subjective realism. It’s also interesting to think about domains where the GAN architecture is less of a good fit. An example of one such domain is natural language.
As opposed to images, which are made of continuous pixel values, sentences are fundamentally sequences of discrete values: that is, words. In a GAN, when the discriminator makes its assessment of the realness of the image, the gradient ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1609.05473#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1609.05473#decodyngFri, 09 Nov 2018 05:29:01 07001811.01778journals/corr/1811.017783On the Evaluation of CommonSense Reasoning in Natural Language UnderstandingCodyWildI should say from the outset: I have a lot of fondness for this paper. It goes upstream of a lot of researchcommunity incentives: It’s not methodologically flashy, it’s not about beating the State of the Art with a bigger, better model (though, those papers certainly also have their place). The goal of this paper was, instead, to dive into a test set used to evaluate performance of models, and try to understand to what extent it’s really providing a rigorous test of what we want out of mo...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.01778#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.01778#decodyngWed, 07 Nov 2018 04:56:41 07001810.06682journals/corr/1810.066823Trellis Networks for Sequence ModelingCodyWildFor solving sequence modeling problems, recurrent architectures have been historically the most commonly used solution, but, recently, temporal convolution networks, especially with dilations to help capture longer term dependencies, have gained prominence. RNNs have theoretically much larger capacity to learn long sequences, but also have a lot of difficulty propagating signal forward through long chains of recurrent operations. This paper, which suggests the approach of Trellis Networks, place...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.06682#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.06682#decodyngMon, 05 Nov 2018 07:46:39 07001808.04891journals/corr/1808.048912Embedding GrammarsCodyWildThis paper is, on the whole, a refreshing jaunt into the applied side of the research word. It isn’t looking to solve a fundamental machine learning problem in some new way, but it does highlight and explore one potential beneficial application of a common and widely used technique: specifically, combining word embeddings with contextfree grammars (such as: regular expressions), to make the latter less rigid.
Regular expressions work by specifying specific hardcoded patterns of symbols, and...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.04891#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.04891#decodyngSun, 04 Nov 2018 06:52:09 07001810.13409journals/corr/1810.134092You May Not Need AttentionOfir PressAn attention mechanism and a separate encoder/decoder are two properties of almost every single neural translation model. The question asked in this paper is how far can we go without attention and without a separate encoder and decoder? And the answer is pretty far! The model presented preforms just as well as the attention model of Bahdanau on the four language directions that are studied in the paper.
The translation model presented in the paper is basically a simple recurrent language mod...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.13409#ofirpress
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.13409#ofirpressSat, 03 Nov 2018 09:31:24 06001810.13409journals/corr/1810.134092You May Not Need AttentionCodyWildI admit it  the title of the paper pulled me in, existing as it does in the chain of weirdly insidermeme papers, starting with Vaswani’s 2017 “Attention Is All You Need”. That paper has been hugely influential, and the domain of machine translation as a whole has begun to move away from processing (or encoding) source sentences with recurrent architectures, to instead processing them using selfattention architectures. (Selfattention is a little too nuanced to go into in full depth here...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.13409#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.13409#decodyngSat, 03 Nov 2018 03:11:48 06001810.04805journals/corr/1810.048055BERT: Pretraining of Deep Bidirectional Transformers for Language UnderstandingCodyWildThe last two years have seen a number of improvements in the field of language model pretraining, and BERT  Bidirectional Encoder Representations from Transformers  is the most recent entry into this canon. The general problem posed by language model pretraining is: can we leverage huge amounts of raw text, which aren’t labeled for any specific classification task, to help us train better models for supervised language tasks (like translation, question answering, logical entailment, etc)? Me...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.04805#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.04805#decodyngFri, 02 Nov 2018 06:43:01 06001810.06721journals/corr/1810.067213Optimizing Agent Behavior over Long Time Scales by Transporting ValuewassnameThis builds on the previous ["MERLIN"]() paper. First they introduce the RMA agent, which is a simplified version of MERLIN which uses model based RL and long term memory. They give the agent long term memory by letting it choose to save and load the agent's working memory (represented by the LSTM's hidden state).
Then they add credit assignment, similar to the RUDDER paper, to get the "Temporal Value Transport" (TVT) agent that can plan long term in the face of distractions. **The critical in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.06721#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.06721#wassnameFri, 02 Nov 2018 01:45:04 060010.1101/2256642Prioritized memory access explains planning and hippocampal replaywassname**TL;DR:** There are 'place cells' in the hippopotamus that are fired when passing through a location. You can take a rat and measure how its cells are activated in a maze, then monitor neurons during planning, rest or sleep. You'll see patterns that show it's thinking of locations in order and focusing on interesting locations. This paper looks at how RL agents do 'prioritized experience replay' and compare it to place cells in animals. The authors do a RL simulation and *qualitatively* compare...
http://www.shortscience.org/paper?bibtexKey=10.1101/225664#wassname
http://www.shortscience.org/paper?bibtexKey=10.1101/225664#wassnameSun, 28 Oct 2018 04:05:27 06001806.07857journals/corr/1806.078573RUDDER: Return Decomposition for Delayed Rewardswassname[Summary by author /u/SirJAM_armedi]().
Math aside, the "big idea" of RUDDER is the following: We use an LSTM to predict the return of an episode. To do this, the LSTM will have to recognize what actually causes the reward (e.g. "shooting the gun in the right direction causes the reward, even if we get the reward only once the bullet hits the enemy after travelling along the screen"). We then use a salience method (e.g. LRP or integrated gradients) to get that information out of the LSTM, and r...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07857#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07857#wassnameSun, 28 Oct 2018 04:05:08 06001810.02334journals/corr/1810.023343Unsupervised Learning via MetaLearningCodyWildThis recent paper, a collaboration involving some of the authors of MAML, proposes an intriguing application of techniques developed in the field of meta learning to the problem of unsupervised learning  specifically, the problem of developing representations without labeled data, which can then be used to learn quickly from a small amount of labeled data. As a reminder, the idea behind meta learning is that you train models on multiple different tasks, using only a small amount of data from ea...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.02334#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.02334#decodyngSat, 13 Oct 2018 03:41:42 060010.1109/83.9022912Active contours without edgesAnmol SharmaTypically, the energy minimization or snakes based object detection frameworks evolve a parametrized curve guided by some form of image gradient information. However due to heavy reliance on gradients, the approaches tend to fail in scenarios where this information is misleading or unavailable. This cripples the snake and renders it unusable as it gets stuck in a localminima away from the actual object. Moreover, the parametrized snake lacks the ability to model multiple evolving curves in a si...
http://www.shortscience.org/paper?bibtexKey=10.1109/83.902291#anmolsharma
http://www.shortscience.org/paper?bibtexKey=10.1109/83.902291#anmolsharmaWed, 10 Oct 2018 20:38:38 060010.1007/bf001335702Snakes: Active contour modelsAnmol SharmaLow level tasks such as edge, contour and line detection are an essential precursor to any downstream image analysis processes. However, most of the approaches targeting these problems work as isolated and autonomous entities, without using any highlevel image information such as context, global shapes, or userlevel input. This leads to errors that can further propagate through the pipeline without providing an opportunity for future correction. In order to address this problem, Kass et al. in...
http://www.shortscience.org/paper?bibtexKey=10.1007/bf00133570#anmolsharma
http://www.shortscience.org/paper?bibtexKey=10.1007/bf00133570#anmolsharmaWed, 10 Oct 2018 20:18:42 06001806.00340journals/corr/1806.003402Producing radiologistquality reports for interpretable artificial intelligenceTess BerthierThe paper presents a modelagnostic extension of deep learning classifiers based on a RNN with a visual attention mechanism for report generation.
![]()
One of the most important points in this paper is not the model, but the dataset they itself: Luke OakdenRayner, one of the authors, is a radiologist and worked a lot to educate the public on current medical datasets ([chest xray blog post]()), how they are made and what are the problems associated with them. In this paper they used 50,363 f...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.00340#tessberthier
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.00340#tessberthierWed, 03 Oct 2018 20:51:21 06001512.03385journals/corr/HeZRS152Deep Residual Learning for Image RecognitionEddie SmolanskySources:


Summary:
 Took the first place in Imagenet 5 main tracks
 Revolution of depth: GoogLeNet was 22 layers with 6.7 top5 error,
Resnet is 152 layers with 3.57 top5 error
 Light on complexity: the 34 layer baseline is 18% of the FLOPs(multiplyadds) of VGG.
 Resnet 152 has lower time complexity than VGG16/19
 Extends well to detection and segmentation tasks
 Just stacking more layers gives worse performance. Why? In theory:
> A deeper model should not have
higher...
http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15#eddiesmolansky
http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15#eddiesmolanskySun, 23 Sep 2018 20:47:58 06001711.07618journals/corr/1711.076182$S^4$Net: Single Stage SalientInstance SegmentationEddie SmolanskyIt's like mask rcnn but for salient instances.
code will be available at .
They invented a layer "mask pooling" that they claim is better than ROI pooling and ROI align.
>As can be seen, our proposed
binary RoIMasking and ternary RoIMasking both outperform
RoIPool and RoIAlign in mAP0.7
. Specifically, our
ternary RoIMasking result improves the RoIAlign result by
around 2.5 points. This reflects that considering more context
information outside the proposals does help for salient
instance seg...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.07618#eddiesmolansky
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.07618#eddiesmolanskySun, 23 Sep 2018 20:39:52 06001705.07426journals/corr/1705.074262The Do's and Don'ts for CNNbased Face VerificationEddie Smolansky# Metadata
* **Title**: The Do’s and Don’ts for CNNbased Face Verification
* **Authors**: Ankan Bansal Carlos Castillo Rajeev Ranjan Rama Chellappa
UMIACS 
University of Maryland, College Park
* **Link**:
# Abstract
>Convolutional neural networks (CNN) have become the most sought after tools for addressing object recognition problems. Specifically, they have produced stateofthe art results for unconstrained face recognition and verification tasks. While the research community appears ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07426#eddiesmolansky
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07426#eddiesmolanskySun, 23 Sep 2018 20:34:23 060010.21105/joss.006762OPEM : Open Source PEM Cell Simulation ToolSepand HaghighiModeling and simulation of protonexchange membrane fuel cells (PEMFC) may work as a powerful tool in the Research & development of renewable energy sources. The OpenSource PEMFC Simulation Tool (OPEM) is a modeling tool for evaluating the performance of proton exchange membrane fuel cells. This package is a combination of models (static/dynamic) that predict the optimum operating parameters of PEMFC. OPEM contained generic models that will accept as input, not only values of the operating vari...
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00676#sepandhaghighi
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00676#sepandhaghighiSat, 08 Sep 2018 10:00:26 06001808.07371journals/corr/1808.073714Everybody Dance NowOleksandr BailoThis paper presents a perframe imagetoimage translation system enabling copying of a motion of a person from a source video to a target person. For example, a source video might be a professional dancer performing complicated moves, while the target person is you. By utilizing this approach, it is possible to generate a video of you dancing as a professional. Check the authors' [video]() for the visual explanation.
**Data preparation**
The authors have manually recorded highresolution vide...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.07371#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1808.07371#ukrdailoWed, 05 Sep 2018 07:15:05 06001804.02341journals/corr/1804.023412Compositional Obverter Communication Learning From Raw Visual InputBen BoginThis paper proposes a new training method for multiagent communication settings. They show the following referential game: A speaker sees an image of a 3d rendered object and describes it to a listener. The listener sees a different image and must decide if it is the same object as described by the speaker (has the same color and shape). The game can only be completed successfully if a communication protocol emerges that can express the color and shape the speaker sees.
The main contribution o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02341#benbogin
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.02341#benboginSun, 02 Sep 2018 21:04:11 060010.21105/joss.007292PyCM: Multiclass confusion matrix library in PythonSepand HaghighiPyCM is a multiclass confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for postclassification model evaluation that supports most classes and overall statistics parameters. PyCM is the swissarmy knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers.
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00729#sepandhaghighi
http://www.shortscience.org/paper?bibtexKey=10.21105/joss.00729#sepandhaghighiSat, 01 Sep 2018 22:20:36 060010.1111/cdep.122822From Babies to Robots: The Contribution of Developmental Robotics to Developmental PsychologyNatalia Diaz Rodriguez, PhDJoint summary from
Developmental robotics is the interdisciplinary approach to the autonomous design of behavioural and cognitive capabilities in artificial agents (robots) that takes direct inspiration from the developmental principles and mechanisms observed in the natural cognitive systems. It relies on a highly interdisciplinary effort of empirical developmental sciences such as developmental psychology, neuroscience, and comparative psychology, and computational and engineering disciplin...
http://www.shortscience.org/paper?bibtexKey=10.1111/cdep.12282#natalia
http://www.shortscience.org/paper?bibtexKey=10.1111/cdep.12282#nataliaThu, 23 Aug 2018 09:55:44 06001709.04326journals/corr/1709.043263Learning with OpponentLearning AwarenessmnoukhovNormal RL agents in multiagent scenarios treat their opponents as a static part of the environment, not taking into account the fact that other agents are learning as well. This paper proposes LOLA, a learning rule that should take the agency and learning of opponents into account by optimizing "return under one step lookahead of opponent learning"
So instead of optimizing under the current parameters of agent 1 and 2
$$V^1(\theta_i^1, \theta_i^2)$$
LOLA proposes to optimize taking into acc...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#mnoukhov
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#mnoukhovMon, 13 Aug 2018 23:01:16 06001805.09733journals/corr/1805.097333Towards Robust Evaluations of Continual LearningNatalia Diaz Rodriguez, PhDThrough a likelihoodfocused derivation of a variational inference (VI) loss, Variational Generative Experience Replay (VGER) presents the closest appropriate likelihood focused alternative to Variational Continual Learning (VCL), the stateof the art priorfocused approach to continual learning.
In non continual learning, the aim is to learn parameters $\omega$ using labelled training data $\mathcal{D}$ to infer $p(y\omega, x)$. In the continual learning context, instead, the data is not in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09733#natalia
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.09733#nataliaFri, 10 Aug 2018 11:27:55 06001806.06621journals/corr/1806.066212Banach Wasserstein GANArtëm SobolevThe paper extends the [WGAN]() paper by replacing the L2 norm in the transportation cost by some other metric $d(x, y)$. By following the same reasoning as in the WGAN paper one arrives at a dual optimization problem similar to the WGAN's one except that the critic $f$ has to be 1Lipschitz w.r.t. a given norm (rather than L2). This, in turn, means that critic's gradient (w.r.t. input $x$) has to be bounded in the dual norm (only in Banach spaces, hence the name). Authors build upon the [WGANGP...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.06621#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.06621#artemsTue, 31 Jul 2018 10:05:24 06001602.05568journals/corr/1602.055682Multilayer Representation Learning for Medical ConceptsJoseph Paul CohenThis model called Med2Vec is inspired by Word2Vec. It is Word2Vec for time series patient visits with ICD codes. The model learns embeddings for medical codes as well as the demographics of patients.
The context is temporal. For each $x_t$ as input the model predicts $x_{t+1}$ and $x_{t1}$ or more depending on the temporal window size.
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.05568#joecohen
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.05568#joecohenSat, 28 Jul 2018 18:09:03 06001802.00400journals/corr/1802.004002A Comparison of Word Embeddings for the Biomedical Natural Language ProcessingJoseph Paul CohenThis paper demonstrates that Word2Vec \cite{1301.3781} can extract relationships between words and produce latent representations useful for medical data. They explore this model on different datasets which yield different relationships between words.
The Word2Vec model works like an autoencoder that predicts the context of a word. The context of a word is composed of the surrounding words as shown below. Given the word in the center the neighboring words are predicted through a bottleneck in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.00400#joecohen
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.00400#joecohenSat, 28 Jul 2018 17:33:08 060010.1007/9783319917047_113An Experimental Evaluation of the Generalizing Capabilities of Process Discovery Techniques and BlackBox Sequence ModelsNiek Tax# Contributions
The contribution of this paper is threefold:
1. We present a method to use *process models* as interpretable sequence models that have a stronger notion of interpretability than what is generally used in the machine learning field (see Section *process models* below),
2. We show that this approach enables the comparison of traditional sequence models (RNNs, LSTMs, Markov Models) with techniques from the research field of *automated process discovery*,
3. We show on a collection ...
http://www.shortscience.org/paper?bibtexKey=10.1007/9783319917047_11#niektax
http://www.shortscience.org/paper?bibtexKey=10.1007/9783319917047_11#niektaxWed, 25 Jul 2018 08:13:25 06001505.05770journals/corr/1505.057703Variational Inference with Normalizing FlowsCodyWildThis paper argues for the use of normalizing flows  a way of building up new probability distributions by applying multiple sets of invertible transformations to existing distributions  as a way of building more flexible variational inference models.
The central premise of a variational autoencoder is that of learning an approximation to the posterior distribution of latent variables  p(zx)  and parameterizing that distribution according to values produced by a neural network. In typical ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1505.05770#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1505.05770#decodyngMon, 23 Jul 2018 15:34:55 06001711.09081journals/corr/1711.090813Deep Extreme Cut: From Extreme Points to Object SegmentationOleksandr BailoThis paper introduces a CNN based segmentation of an object that is defined by a user using four extreme points (i.e. bounding box). Interestingly, in a related work, it has been shown that clicking extreme points is about 5 times more efficient than drawing a bounding box in terms of speed.
The extreme points have several goals in this work. First, they are used as a bounding box to crop the object of interest. Secondly, they are utilized to create a heatmap with activations in the regions o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09081#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09081#ukrdailoMon, 23 Jul 2018 02:25:26 06001803.09693journals/corr/1803.096933Efficient Interactive Annotation of Segmentation Datasets with PolygonRNN++Oleksandr BailoIn this paper, the authors develop a system for automatic as well as an interactive annotation (i.e. segmentation) of a dataset. In the automatic mode, bounding boxes are generated by another network (e.g. FasterRCNN), while in the interactive mode, the input bounding box around an object of interest comes from the human in the loop.
The system is composed of the following parts:
1. **Residual encoder with skip connections**. This step acts as a feature extractor. The ResNet50 with few modifi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09693#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.09693#ukrdailoSun, 22 Jul 2018 07:51:07 06001806.10474journals/corr/1806.104742The challenge of realistic music generation: modelling raw audio at scaleCodyWildThis paper draws from two strains of recent work: the hierarchical music modeling of MusicVAE  which intentionally model musical structure at both local and more global levels  , and the discrete autoencoder approaches of Vector Quantized VAEs  which seek to maintain the overall structure of a VAE, but apply a less aggressive form of regularization.
The goal of this paper is to build a model that can generate music, not from that music’s symbolic representation  lists of notes  but from ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.10474#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.10474#decodyngSun, 22 Jul 2018 05:48:19 06001807.01604journals/corr/1807.016042QuasiMonte Carlo Variational InferenceArtëm SobolevVariational Inference builds around the ELBO (Evidence Lower BOund)  a lower bound on a marginal loglikelihood of the observed data $\log p(x) = \log \int p(x, z) dz$ (which is typically intractable). The ELBO makes use of an approximate posterior to form a lower bound:
$$
\log p(x) \ge \mathbb{E}_{q(zx)} \log \frac{p(x, z)}{q(zx)}
$$
# Introduction to Quasi Monte Carlo
It's assumed that both the join $p(x, z)$ (or, equivalently the likelihood $p(xz)$ and the prior $p(z)$) and the appro...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.01604#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.01604#artemsFri, 20 Jul 2018 11:01:35 06001709.04326journals/corr/1709.043264Learning with OpponentLearning AwarenessCodyWildA central question of this paper is: under what circumstances will you see agents that have been trained to optimize their own reward implement strategies  like tit for tat  that are are more sophisticated and higher overall reward than each agent simply pursuing its dominant strategy. The games under consideration here are “general sum” games like Iterated Prisoner’s Dilemma, where each agent’s dominant strategy is to defect, but with some amount of coordination or reciprocity, better...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1709.04326#decodyngThu, 19 Jul 2018 16:55:33 06001806.05759journals/corr/1806.057592Insights on representational similarity in neural networks with canonical correlationCodyWildThe overall goal of the paper is measure how similar different layer activation profiles are to one another, in hopes of being able to quantify the similarity of the representations that different layers are learning. If you had a measure that captured this, you could ask questions like: “how similar are the representations that are learned by different networks on the same task”, and “what is the dynamic of representational change in a given layer throughout training”?
Canonical Corre...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.05759#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.05759#decodyngTue, 17 Jul 2018 23:18:12 06001802.07535journals/corr/1802.075353BRUNO: A Deep Recurrent Model for Exchangeable DataArtëm SobolevIf one is a Bayesian he or she best expresses beliefs about next observation $x_{n+1}$ after observing $x_1, \dots, x_n$ using the **posterior predictive distribution**: $p(x_{n+1}\vert x_1, \dots, x_n)$. Typically one invokes the de Finetti theorem and assumes there exists an underlying model $p(x\vert\theta)$, hence $p(x_{n+1}\vert x_1, \dots, x_n) = \int p(x_{n+1} \vert \theta) p(\theta \vert x_1, \dots, x_n) d\theta$, however this integral is far from tractable in most cases. Nevertheless, h...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.07535#artems
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.07535#artemsMon, 09 Jul 2018 17:46:37 06001712.01238journals/corr/1712.012384Learning by Asking QuestionsOleksandr BailoThis paper is about interactive Visual Question Answering (VQA) setting in which agents must ask questions about images to learn. This closely mimics how people learn from each other using natural language and has a strong potential to learn much faster with fewer data. It is referred as learning by asking (LBA) through the paper. The approach is composed of three models:
1. **Question proposal module** is responsible for generating _important_ questions about the image. It is a combination of...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.01238#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.01238#ukrdailoSun, 08 Jul 2018 12:32:56 06001803.07485journals/corr/1803.074852Actor and Action Video Segmentation from a SentenceOleksandr BailoThis paper performs pixelwise segmentation of the object of interest which is specified by a sentence. The model is composed of three main components: a **textual encoder**, a **video encoder**, and a **decoder**.
 **Textual encoder** is word2vec pretrained model followed by 1D CNN.
 **Video encoder** is a 3D CNN to obtain a visual representation of the video (can be combined with optical flow to obtain motion information).
 **Decoder**. Given a sentence representation $T$ a separate filt...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.07485#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.07485#ukrdailoWed, 04 Jul 2018 05:47:29 06001711.11543journals/corr/1711.115433Embodied Question AnsweringOleksandr BailoThis paper introduces a new AI task  Embodied Question Answering. The goal of this task for an agent is to be able to answer the question by observing the environment through a single egocentric RGB camera while being able to navigate inside the environment. The agent has 4 natural modules:
1. **Vision**. 224x224 RGB images are processed by CNN to produce a fixedsize representation. This CNN is pretrained on pixeltopixel tasks such as RGB reconstruction, semantic segmentation, and depth est...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.11543#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.11543#ukrdailoWed, 04 Jul 2018 02:12:50 060010.18653/v1/p1610782TreetoSequence Attentional Neural Machine TranslationTim MillerThis work extends sequencetosequence models for machine translation by using syntactic information on the source language side. This paper looks at the translation task where English is the source language, and Japanese is the target language. The dataset is the ASPEC corpus of scientific paper abstracts that seem to be in both English and Japanese? (See note below). The trees for the source (English) are generated by running the ENJU parser on the English data, resulting in binary trees, and ...
http://www.shortscience.org/paper?bibtexKey=10.18653/v1/p161078#tmills
http://www.shortscience.org/paper?bibtexKey=10.18653/v1/p161078#tmillsTue, 03 Jul 2018 15:43:38 06001804.08328journals/corr/1804.083284Taskonomy: Disentangling Task Transfer LearningOleksandr BailoThe goal of this work is to perform transfer learning among numerous tasks and to discover visual relationships among them. Specifically, while we intiutively might guess the depth of an image and surface normals are related, this work takes a step forward and discovers a beneficial relationship among 26 tasks in terms of task transferability  many of them are not obvious. This is important for scenarios when an insufficient budget is available for target task for annotation, thus, learned repr...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.08328#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/corr/1804.08328#ukrdailoMon, 02 Jul 2018 02:46:39 06001702.02284journals/corr/1702.022842Adversarial Attacks on Neural Network PoliciesDavid StutzHuang et al. study adversarial attacks on reinforcement learning policies. One of the main problems, in contrast to supervised learning, is that there might not be a reward in any time step, meaning there is no clear objective to use. However, this is essential when crafting adversarial examples as they are mostly based on maximizing the training loss. To avoid this problem, Huang et al. assume a welltrained policy; the policy is expected to output a distribution over actions. Then, adversarial...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.02284#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.02284#davidstutzThu, 28 Jun 2018 19:16:01 06001712.03141journals/corr/1712.031412Wild Patterns: Ten Years After the Rise of Adversarial Machine LearningDavid StutzBiggio and Roli provide a comprehensive survey and discussion of work in adversarial machine learning. In contrast to related work [1,2], they explicitly discuss the relation of recent developments regarding the security of deep neural networks (as primarily discussed in [1] and [2]) and adversarial machine learning in general. The latter can be traced back to early work starting in 2004, e.g. involving adversarial attacks on spam filters. As a result, terminology used by Biggio and Roli is slig...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.03141#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.03141#davidstutzThu, 28 Jun 2018 19:11:16 06001801.00553journals/corr/1801.005532Threat of Adversarial Attacks on Deep Learning in Computer Vision: A SurveyDavid StutzAkhtar and Mian present a comprehensive survey of attacks and defenses of deep neural networks, specifically in computer vision. Published on ArXiv in January 2018, but probably written prior to August 2017, the survey includes recent attacks and defenses. For example, Table 1 presents an overview of attacks on deep neural networks – categorized by knowledge, target and perturbation measure. The authors also provide a strength measure – in the form of a 15 start “rating”. Personally, ho...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00553#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.00553#davidstutzThu, 28 Jun 2018 19:06:48 06001712.07107journals/corr/1712.071072Adversarial Examples: Attacks and Defenses for Deep LearningDavid StutzYuan et al. present a comprehensive survey of attacks, defenses and studies regarding the robustness and security of deep neural networks. Published on ArXiv in December 2017, it includes most recent attacks and defenses. For examples, Table 1 lists all known attacks – Yuan et al. categorize the attacks according to the level of knowledge needed, targeted or nontargeted, the optimization needed (e.g. iterative) as well as the perturbation measure employed. As a result, Table 1 gives a solid o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.07107#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.07107#davidstutzThu, 28 Jun 2018 18:59:29 06001605.01775journals/corr/1605.017752Adversarial Diversity and Hard Positive GenerationDavid StutzRozsa et al. propose PASS, an perceptual similarity metric invariant to homographies to quantify adversarial perturbations. In particular, PASS is based on the structural similarity metric SSIM [1]; specifically
$PASS(\tilde{x}, x) = SSIM(\psi(\tilde{x},x), x)$
where $\psi(\tilde{x}, x)$ transforms the perturbed image $\tilde{x}$ to the image $x$ by applying a homography $H$ (which can be found through optimization). Based on this similarity metric, they consider additional attacks which creat...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.01775#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.01775#davidstutzThu, 28 Jun 2018 18:32:44 06001605.07262journals/corr/1605.072622Measuring Neural Net Robustness with ConstraintsDavid StutzBastani et al. propose formal robustness measures and an algorithm for approximating them for piecewise linear networks. Specifically, the notion of robustness is similar to related work:
$\rho(f,x) = \inf\{\epsilon \geq 0  f \text{ is not } (x,\epsilon)\text{robust}$
where $(x,\epsilon)$robustness demands that for every $x'$ with $\x'x\_\infty$ it holds that $f(x') = f(x)$ – in other words, the label does not change for perturbations $\eta = x'x$ which are small in terms of the $L_\...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.07262#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1605.07262#davidstutzThu, 28 Jun 2018 18:23:07 06001711.10925journals/corr/1711.109254Deep Image PriorDavid StutzUlyanov et al. utilize untrained neural networks as regularizer/prior for various image restoration tasks such as denoising, inpainting and superresolution. In particualr, the standard formulation of such tasks, i.e.
$x^\ast = \arg\min_x E(x, x_0) + R(x)$
where $x_0$ is the input image and $E$ a taskdependent data term, is rephrased as follows:
$\theta^\ast = \arg\min_\theta E(f_\theta(z); x_0)$ and $x^\ast = f_{\theta^\ast}(z)$
for a fixed but random $z$. Here, the regularizer $R$ is esse...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10925#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.10925#davidstutzThu, 28 Jun 2018 18:14:51 06001801.02774journals/corr/1801.027742Adversarial SpheresDavid StutzGilmer et al. study the existence of adversarial examples on a synthetic toy datasets consisting of two concentric spheres. The dataset is created by randomly sampling examples from two concentric spheres, one with radius $1$ and one with radius $R = 1.3$. While the authors argue that difference difficulties of the dataset can be created by varying $R$ and the dimensionality, they merely experiment with $R = 1.3$ and a dimensionality of $500$. The motivation to study this dataset comes form the ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02774#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02774#davidstutzThu, 28 Jun 2018 18:02:30 06001608.08967journals/corr/1608.089673Robustness of classifiers: from adversarial to random noiseDavid StutzFawzi et al. study robustness in the transition from random samples to semirandom and adversarial samples. Specifically they present bounds relating the norm of an adversarial perturbation to the norm of random perturbations – for the exact form I refer to the paper. Personally, I find the definition of semirandom noise most interesting, as it allows to get an intuition for distinguishing random noise from adversarial examples. As in related literature, adversarial examples are defined as
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.08967#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.08967#davidstutzThu, 28 Jun 2018 17:54:18 06001608.07690journals/corr/1608.076903A Boundary Tilting Persepective on the Phenomenon of Adversarial ExamplesDavid StutzTanay and Griffin introduce the boundary tilting perspective as alternative to the “linear explanation” for adversarial examples. Specifically, they argue that it is not reasonable to assume that the linearity in deep neural networks causes the existence of adversarial examples. Originally, Goodfellow et al. [1] explained the impact of adversarial examples by considering a linear classifier:
$w^T x' = w^Tx + w^T\eta$
where $\eta$ is the adversarial perturbations. In large dimensions, the s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.07690#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.07690#davidstutzThu, 28 Jun 2018 17:50:32 06001801.09344journals/corr/1801.093443Certified Defenses against Adversarial ExamplesDavid StutzRaghunathan et al. provide an upper bound on the adversarial loss of twolayer networks and also derive a regularization method to minimize this upper bound. In particular, the authors consider the scoring functions $f^i(x) = V_i^T\sigma(Wx)$ with bounded derivative $\sigma'(z) \in [0,1]$ which holds for Sigmoid and ReLU activation functions. Still, the model is very constrained considering recent, wellperformng deep (convolutional) neural networks. The upper bound is then derived by considerin...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.09344#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.09344#davidstutzThu, 28 Jun 2018 17:41:17 0600conf/icml/CisseBGDU173Parseval Networks: Improving Robustness to Adversarial ExamplesDavid StutzCisse et al. propose parseval networks, deep neural networks regularized to learn orthonormal weight matrices. Similar to the work by Hein et al. [1], the mean idea is to constrain the Lipschitz constant of the network – which essentially means constraining the Lipschitz constants of each layer independently. For weight matrices, this can be achieved by constraining the matrixnorm. However, this (depending on the norm used) is often intractable during gradient descent training. Therefore, Cis...
http://www.shortscience.org/paper?bibtexKey=conf/icml/CisseBGDU17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/icml/CisseBGDU17#davidstutzThu, 28 Jun 2018 17:33:43 06001801.02613journals/corr/1801.026133Characterizing Adversarial Subspaces Using Local Intrinsic DimensionalityDavid StutzMa et al. detect adversarial examples based on their estimated intrinsic dimensionality. I want to note that this work is also similar to [1] – in both publications, local intrinsic dimensionality is used to analyze adversarial examples. Specifically, the intrinsic dimensionality of a sample is estimated based on the radii $r_i(x)$ of the $k$nearest neighbors around a sample $x$:
$ \left(\frac{1}{k} \sum_{i = 1}^k \log \frac{r_i(x)}{r_k(x)}\right)^{1}$.
For details regarding the original,...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02613#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.02613#davidstutzWed, 27 Jun 2018 21:38:25 06001703.00410journals/corr/1703.004103Detecting Adversarial Samples from ArtifactsDavid StutzFeinman et al. use dropout to compute an uncertainty measure that helps to identify adversarial examples. Their socalled Bayesian Neural Network Uncertainty is computed as follows:
$\frac{1}{T} \sum_{i=1}^T \hat{y}_i^T \hat{y}_i  \left(\sum_{i=1}^T \hat{y}_i\right)\left(\sum_{i=1}^T \hat{y}_i\right)$
where $\{\hat{y}_1,\ldots,\hat{y}_T\}$ is a set of stochastic predictions (i.e. predictions with different noise patterns in the dropout layers). Here, is can easily be seen that this measure co...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.00410#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.00410#davidstutzWed, 27 Jun 2018 21:29:35 06001705.07263journals/corr/1705.072633Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection MethodsDavid StutzCarlini and Wagner study the effectiveness of adversarial example detectors as defense strategy and show that most of them can by bypassed easily by known attacks. Specifically, they consider a set of adversarial example detection schemes, including neural networks as detectors and statistical tests. After extensive experiments, the authors provide a set of lessons which include:
 Randomization is by far the most effective defense (e.g. dropout).
 Defenses seem to be datasetspecific. There is...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07263#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07263#davidstutzWed, 27 Jun 2018 21:22:18 06001702.06280journals/corr/1702.062803On the (Statistical) Detection of Adversarial ExamplesDavid StutzGrosse et al. use statistical tests to detect adversarial examples; additionally, machine learning algorithms are adapted to detect adversarial examples onthefly of performing classification. The idea of using statistics tests to detect adversarial examples is simple: assuming that there is a true data distribution, a machine learning algorithm can only approximate this distribution – i.e. each algorithm “learns” an approximate distribution. The ideal adversary uses this discrepancy to d...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.06280#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1702.06280#davidstutzWed, 27 Jun 2018 21:08:28 06001711.09404journals/corr/1711.094043Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input GradientsDavid StutzRoss and DoshiVelez propose input gradient regularization to improve robustness and interpretability of neural networks. As the discussion of interpretability is quite limited in the paper, the main contribution is an extensive evaluation of input gradient regularization against adversarial examples – in comparison to defenses such as distillation or adversarial training. Specifically, input regularization as proposed in [1] is used:
$\arg\min_\theta H(y,\hat{y}) + \lambda \\nabla_x H(y,\ha...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09404#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.09404#davidstutzWed, 27 Jun 2018 20:04:56 0600conf/nips/HeinA173Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation.David StutzHein and Andriushchenko give a intuitive bound on the robustness of neural networks based on the local Lipschitz constant. With robustness, the authors refer a small $\epsilon$ball around each sample; this ball is supposed to describe the region where the neural network predicts a constant class. This means that adversarial examples have to compute changes large enough to leave these robust areas. Larger $\epsilon$balls imply higher robustness to adversarial examples.
When considering a singl...
http://www.shortscience.org/paper?bibtexKey=conf/nips/HeinA17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/HeinA17#davidstutzWed, 27 Jun 2018 19:57:22 06001802.01421journals/corr/1802.014213Adversarial Vulnerability of Neural Networks Increases With Input DimensionDavid StutzSimonGabriel et al. Study the robustness of neural networks with respect to the input dimensionality. Their main hypothesis is that the vulnerability of neural networks against adversarial perturbations increases with the input dimensionality. To support this hypothesis, they provide a theoretical analysis as well as experiments.
The general idea of robustness is that small perturbations $\delta$ of the input $x$ do only result in small variations $\delta \mathcal{L}$ of the loss:
$\delta \ma...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01421#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.01421#davidstutzWed, 27 Jun 2018 19:41:53 06001703.09202journals/corr/1703.092023Biologically inspired protection of deep networks from adversarial attacksDavid StutzNayebi and Ganguli propose saturating neural networks as defense against adversarial examples. The main observation driving this paper can be stated as follows: Neural networks are essentially based on linear sums of neurons (e.g. fully connected layers, convolutiona layers) which are then activated; by injecting a small amount of noise per neuron it is possible to shift the final sum by large values, thereby propagating the noisy through the network and fooling the network into misclassifying a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.09202#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.09202#davidstutzWed, 27 Jun 2018 19:25:51 06001704.01155journals/corr/1704.011553Feature Squeezing: Detecting Adversarial Examples in Deep Neural NetworksDavid StutzXu et al. propose feature squeezing for detecting and defending against adversarial examples. In particular, they consider “squeezing” the bit depth of the input images as well as local and nonlocal smoothing (Gaussian, median filtering etc.). In experiments they show that feature squeezing preserves accuracy while defending against adversarial examples. Figure 1 additionally shows an illustration of how feature squeezing can be used to detect adversarial examples.
Figure 1: Illustration ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01155#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01155#davidstutzWed, 27 Jun 2018 19:17:53 060010.1101/2625013Generative adversarial networks uncover epidermal regulators and predict single cell perturbationsDavid StutzLee et al. propose a variant of adversarial training where a generator is trained simultaneously to generated adversarial perturbations. This approach follows the idea that it is possible to “learn” how to generate adversarial perturbations (as in [1]). In this case, the authors use the gradient of the classifier with respect to the input as hint for the generator. Both generator and classifier are then trained in an adversarial setting (analogously to generative adversarial networks), see t...
http://www.shortscience.org/paper?bibtexKey=10.1101/262501#davidstutz
http://www.shortscience.org/paper?bibtexKey=10.1101/262501#davidstutzWed, 27 Jun 2018 19:08:46 06001710.10571journals/corr/1710.105713Certifying Some Distributional Robustness with Principled Adversarial TrainingDavid StutzSinha et al. introduce a variant of adversarial training based on distributional robust optimization. I strongly recommend reading the paper for understanding the introduced theoretical framework. The authors also provide guarantees on the obtained adversarial loss – and show experimentally that this guarantee is a realistic indicator. The adversarial training variant itself follows the general strategy of training on adversarially perturbed training samples in a minmax framework. In each ite...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10571#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10571#davidstutzWed, 27 Jun 2018 19:00:07 06001511.05432journals/corr/1511.054323Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust OptimizationDavid StutzShaham et al. provide an interpretation of adversarial training in the context of robust optimization. In particular, adversarial training is posed as minmax problem (similar to other related work, as I found):
$\min_\theta \sum_i \max_{r \in U_i} J(\theta, x_i + r, y_i)$
where $U_i$ is called the uncertainty set corresponding to sample $x_i$ – in the context of adversarial examples, this might be an $\epsilon$ball around the sample quantifying the maximum perturbation allowed; $(x_i, y_i)...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.05432#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.05432#davidstutzWed, 27 Jun 2018 18:53:50 06001511.03034journals/corr/1511.030343Learning with a Strong AdversaryDavid StutzHuang et al. propose a variant of adversarial training called “learning with a strong adversary”. In spirit the idea is also similar to related work [1]. In particular, the authors consider the minmax objective
$\min_g \sum_i \max_{\r^{(i)}\\leq c} l(g(x_i + r^{(i)}), y_i)$
where $g$ ranges over expressible functions and $(x_i, y_i)$ is a training sample. In the remainder of the paper, Huang et al. Address the problem of efficiently computing $r^{(i)}$ – i.e. a strong adversarial exam...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.03034#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.03034#davidstutzWed, 27 Jun 2018 18:47:59 06001507.00677journals/corr/1507.006773Distributional Smoothing with Virtual Adversarial TrainingDavid StutzMiyato et al. propose distributional smoothing (or virtual adversarial training) as defense against adversarial examples. However, I think that both terms do not give a good intuition of what is actually done. Essentially, a regularization term is introduced. Letting $p(yx,\theta)$ be the learned model, the regularizer is expressed as
$\text{KL}(p(yx,\theta)p(yx+r,\theta)$
where $r$ is the perturbation that maximizes the KullbackLeibler divergence above, i.e.
$r = \arg\max_r \{\text{KL}(...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1507.00677#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1507.00677#davidstutzWed, 27 Jun 2018 18:43:14 06001707.06728journals/corr/1707.067283Efficient Defenses Against Adversarial AttacksDavid StutzZantedschi et al. propose Gaussian data augmentation in conjunction with bounded $\text{ReLU}$ activations as defense strategy against adversarial examples. Here, Gaussian data augmentation refers to the practice of adding Gaussian noise to the input during training.
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.06728#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.06728#davidstutzWed, 27 Jun 2018 18:29:49 06001602.02389journals/corr/1602.023893Ensemble Robustness of Deep Learning AlgorithmsDavid StutzZahavy et al. introduce the concept of ensemble robustness and show that it can be used as indicator for generalization performance. In particular, the main idea is to lift he concept of robustness against adversarial examples to ensemble of networks – as trained, e.g. through Dropout or BayesbyBackprop. Letting $Z$ denote the sample set, a learning algorithm is $(K, \epsilon)$ robust if $Z$ can be divided into $K$ disjoint sets $C_1,\ldots,C_K$ such that for every training set $s_1,\ldots,s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.02389#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1602.02389#davidstutzWed, 27 Jun 2018 18:18:37 06001712.00673journals/corr/1712.006733Towards Robust Neural Networks via Random SelfensembleDavid StutzLiu et al. propose randomizing neural networks, implicitly learning an ensemble of models, to defend against adversarial attacks. In particular, they introduce Gaussian noise layers before regular convolutional layers. The noise can be seen as additional parameter of the model. During training, noise is randomly added. During testing, the model is evaluated on a single testing input using multiple random noise vectors; this essentially corresponds to an ensemble of different models (parameterize...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.00673#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.00673#davidstutzWed, 27 Jun 2018 18:07:27 06001711.01768journals/corr/1711.017683Towards ReverseEngineering BlackBox Neural NetworksDavid StutzOh et al. propose two different approaches for whitening black box neural networks, i.e. predicting details of their internals such as architecture or training procedure. In particular, they consider attributes regarding architecture (activation function, dropout, max pooling, kernel size of convolutional layers, number of convolutionaly/fully connected layers etc.), attributes concerning optimization (batch size and optimization algorithm) and attributes regarding the data (data split and size)...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.01768#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.01768#davidstutzWed, 27 Jun 2018 17:59:09 06001704.01547journals/corr/1704.015473Comment on "Biologically inspired protection of deep networks from adversarial attacks"David StutzBrendel et al. propose a decisionbased blackbox attacks against (deep convolutional) neural networks. Specifically, the socalled Boundary Attack starts with a random adversarial example (i.e. random noise that is not classified as the image to be attacked) and randomly perturbs this initialization to move closer to the target image while remaining misclassified. In pseudo code, the algorithm is described in Algorithm 1. Key component is the proposal distribution $P$ used to guide the adversar...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01547#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1704.01547#davidstutzTue, 26 Jun 2018 21:40:59 06001708.03999journals/corr/1708.039993ZOO: Zeroth Order Optimization based Blackbox Attacks to Deep Neural Networks without Training Substitute ModelsDavid StutzChen et al. propose a gradientbased blackbox attack to compute adversarial examples. Specifically, they follow the general idea of [1] where the following objective is optimized:
$\min_x \x – x_0\_2 + c \max\{\max_{i\neq t}\{z_i\} – z_t,  \kappa\}$.
Here, $x$ is the adversarial example based on training sample $x_0$. The second part expresses that $x$ is supposed to be misclassified, i.e. the logit $z_i$ for some $i \neq t$ distinct form the true label $t$ is supposed to be larger tha...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.03999#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.03999#davidstutzTue, 26 Jun 2018 21:25:44 06001708.01697journals/corr/1708.016973Adversarial Robustness: Softmax versus OpenmaxDavid StutzRozsa et al. describe an adersarial attack against OpenMax [1] by directly targeting the logits. Specifically, they assume a network using OpenMax instead of a SoftMax layer to compute the final class probabilities. OpenMax allows “openset” networks by also allowing to reject input samples. By directly targeting the logits of the trained network, i.e. iteratively pushing the logits in a target direction, it does not matter whether SoftMax or OpenMax layers are used on top, the network can b...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.01697#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.01697#davidstutzTue, 26 Jun 2018 21:19:55 06001511.07528journals/corr/1511.075283The Limitations of Deep Learning in Adversarial SettingsDavid StutzPapernot et al. Introduce a novel attack on deep networks based on socalled adversarial saliency maps that are computed independently of a loss. Specifically, they consider – for a given network $F(X)$ – the forward derivative
$\nabla F = \frac{\partial F}{\partial X} = \left[\frac{\partial F_j(X)}{\partial x_i}\right]_{i,j}$.
Essentially, this is the regular derivative of $F$ with respect to its input; Papernot et al. seem to refer to is as “forward” derivative as it stands in contra...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.07528#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.07528#davidstutzTue, 26 Jun 2018 21:14:29 06001712.02779journals/corr/1712.027793A Rotation and a Translation Suffice: Fooling CNNs with Simple TransformationsDavid StutzEngstrom et al. demonstrate that spatial transformations such as translations and rotations can be used to generate adversarial examples. Personally, however, I think that the paper does not address the question where adversarial perturbations “end” and generalization issues “start”. For larger translations and rotations, the problem is clearly a problem of generalization. Small ones could also be interpreted as adversarial perturbations – especially when they are computed under the in...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.02779#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.02779#davidstutzTue, 26 Jun 2018 21:05:51 06001607.02533journals/corr/1607.025333Adversarial examples in the physical worldDavid StutzKurakin et al. demonstrate that adversarial examples are also a concern in the physical world. Specifically, adversarial examples are crafted digitally and then printed to see if the classification network, running on a smartphone still misclassifies the examples. In many cases, adversarial examples are still able to fool the network, even after printing.
Figure 1: Illustration of the experimental setup.
Also find this summary at [davidstutz.de]().
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.02533#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.02533#davidstutzTue, 26 Jun 2018 21:01:38 06001802.05365journals/corr/1802.053653Deep contextualized word representationsmnoukhovThis paper introduces a deep universal word embedding based on using a bidirectional LM (in this case, biLSTM). First words are embedded with a CNNbased, characterlevel, contextfree, token embedding into $x_k^{LM}$ and then each sentence is parsed using a biLSTM, maximizing the loglikelihood of a word given it's forward and backward context (much like a normal language model).
The innovation is in taking the output of each layer of the LSTM ($h_{k,j}^{LM}$ being the output at layer $j$)
$...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05365#mnoukhov
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.05365#mnoukhovTue, 26 Jun 2018 20:56:47 06001707.03501journals/corr/1707.035013NO Need to Worry about Adversarial Examples in Object Detection in Autonomous VehiclesDavid StutzLu et al. present experiments regarding adversarial examples in the real world, i.e. after printing them. Personally, I find it interesting that researchers are studying how networks can be fooled by physically perturbing images. For me, one of the main conclusions it that it is very hard to evaluate the robustness of networks against physical perturbations. Often it is unclear whether changed lighting conditions, distances or viewpoints to objects might cause the network to fail – which means...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.03501#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1707.03501#davidstutzTue, 26 Jun 2018 20:56:04 06001611.01236journals/corr/1611.012363Adversarial Machine Learning at ScaleDavid StutzKurakin et al. present some larger scale experiments using adversarial training on ImageNet to increase robustness. In particular, they claim to be the first using adversarial training on ImageNet. Furthermore, they provide experiments underlining the following conclusions:
 Adversarial training can also be seen as regularizer. This, however, is not surprising as training on noisy training samples is also known to act as regularization.
 Label leaking describes the observation that an adversar...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.01236#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.01236#davidstutzTue, 26 Jun 2018 20:53:02 06001611.02770journals/corr/1611.027703Delving into Transferable Adversarial Examples and Blackbox AttacksDavid StutzLiu et al. provide a comprehensive study on the transferability of adversarial examples considering different attacks and models on ImageNet. In their experiments, they consider both targeted and nontargeted attack and also provide a realworld example by attacking clarifai.com. Here, I want to list some interesting conclusions drawn from their experiments:
 Nontargeted attacks easily transfer between models; targetedattacks, in contrast, do generally not transfer – meaning that the target...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02770#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.02770#davidstutzTue, 26 Jun 2018 20:45:23 06001610.08401journals/corr/1610.084013Universal adversarial perturbationsDavid StutzMoosaviDezfooli et al. propose universal adversarial perturbations – perturbations that are imageagnostic. Specifically, they extend the framework for crafting adversarial examples, i.e. by iteratively solving
$\arg\min_r \r \_2$ s.t. $f(x + r) \neq f(x)$.
Here, $r$ denotes the adversarial perturbation, $x$ a training sample and $f$ the neural network. Instead of solving this problem for a specific $x$, the authors propose to solve the problem over the full training set, i.e. in each ite...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1610.08401#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1610.08401#davidstutzTue, 26 Jun 2018 20:39:55 06001608.04644journals/corr/1608.046443Towards Evaluating the Robustness of Neural NetworksDavid StutzCarlini and Wagner propose three novel methods/attacks for adversarial examples and show that defensive distillation is not effective. In particular, they devise attacks for all three commonly used norms $L_1$, $L_2$ and $L_\infty$ – which are used to measure the deviation of the adversarial perturbation from the original testing sample. In the course of the paper, starting with the targeted objective
$\min_\delta d(x, x + \delta)$ s.t. $f(x + \delta) = t$ and $x+\delta \in [0,1]^n$,
they cons...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.04644#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.04644#davidstutzTue, 26 Jun 2018 20:24:14 06001706.06083journals/corr/1706.060833Towards Deep Learning Models Resistant to Adversarial AttacksDavid StutzMadry et al. provide an interpretation of training on adversarial examples as sattlepoint (i.e. minmax) problem. Based on this formulation, they conduct several experiments on MNIST and CIFAR10 supporting the following conclusions:
 Projected gradient descent might be “strongest” adversary using firstorder information. Here, gradient descent is used to maximize the loss of the classifier directly while always projecting onto the set of “allowed” perturbations (e.g. within an $\epsil...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.06083#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.06083#davidstutzTue, 26 Jun 2018 20:08:20 06001412.6572journals/corr/1412.65723Explaining and Harnessing Adversarial ExamplesDavid StutzGoodfellow et al. introduce the fast gradient sign method (FGSM) to craft adversarial examples and further provide a possible interpretation of adversarial examples considering linear models. FGSM is a grdientbased, one step method for generating adversarial examples. In particular, letting $J$ be the objective optimized during training and $\epsilon$ be the maximum $\infty$norm of the adversarial perturbation, FGSM computes
$x' = x + \eta = x + \epsilon \text{sign}(\nabla_x J(x, y))$
where $y...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6572#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1412.6572#davidstutzTue, 26 Jun 2018 20:02:41 06001705.07204journals/corr/1705.072043Ensemble Adversarial Training: Attacks and DefensesDavid StutzTramèr et al. introduce both a novel adversarial attack as well as a defense mechanism against blackbox attacks termed ensemble adversarial training. I first want to highlight that – in addition to the proposed methods – the paper gives a very good discussion of stateoftheart attacks as well as defenses and how to put them into context. Tramèr et al. consider blackbox attacks, focussing on transferrable adversarial examples. Their main observation is as follows: oneshot attacks (i.e....
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07204#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1705.07204#davidstutzTue, 26 Jun 2018 19:56:11 06001511.04508journals/corr/1511.045082Distillation as a Defense to Adversarial Perturbations against Deep Neural NetworksDavid StutzPapernot et al. build upon the idea of network distillation [1] and propose a simple mechanism to defend networks against adversarial attacks. The main idea of distillation – originally introduced to “distill” the knowledge of very deep networks into smaller ones – is to train a second, possibly smaller network, with the probability distributions of the original, possibly larger network as supervision. Papernot et al. as well as the authors of [1] argue that the probability distributions...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.04508#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1511.04508#davidstutzTue, 26 Jun 2018 18:29:02 06001604.03540journals/corr/1604.035402Training Regionbased Object Detectors with Online Hard Example MiningRyanDsouzaThe problem statement this paper tries to address is that the training set is distinguished by a large imbalance between the number of foreground examples and background examplesTo make the point concrete cases like sliding window object detectors like deformable parts model, the imbalance may be as extreme as 100,000 background examples to one annotated foreground example.
Before i proceed to give you the details of Hard Example mining, i just want to note that HEM in its essence is mostly w...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1604.03540#ryandsouza
http://www.shortscience.org/paper?bibtexKey=journals/corr/1604.03540#ryandsouzaTue, 26 Jun 2018 14:30:07 06001703.05175journals/corr/1703.051753Prototypical Networks for Fewshot LearningCodyWildThis paper describes an architecture designed for generating class predictions based on a set of features in situations where you may only have a few examples per class, or, even where you see entirely new classes at test time. Some prior work has approached this problem in ridiculously complex fashion, up to and including training a network to predict the gradient outputs of a metanetwork that it thinks would best optimize loss, given a new class. The method of Prototypical Networks prides its...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.05175#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.05175#decodyngTue, 26 Jun 2018 05:00:56 06001710.04087journals/corr/1710.040872Word Translation Without Parallel DataCodyWildThe core goal of this paper is to perform in an unsupervised (read: without parallel texts) way what other machine translation researchers had previously only effectively performed in a supervised way: the creation of a wordtoword translational mapping between natural languages. To frame the problem concretely: the researchers start with word embeddings learned in each language independently, and their desired output is a set of nearest neighbors for a source word that contains the true target...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.04087#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.04087#decodyngTue, 26 Jun 2018 04:58:44 0600