ShortScience.org Latest SummariesShortScience.org Latest Summaries
http://www.shortscience.org/
60Wed, 22 May 2019 21:31:01 +00001804.00168journals/corr/abs-1804-001682Learning to Navigate in Cities Without a MapCodyWildThis paper out of DeepMind used a Google StreetView dataset and set out to train a network capable of navigating to a given goal destination, without knowing where it was on any birds-eye map, and with its only input being photographic viewpoint images of its current location and orientation. This was done through a framework of reinforcement learning, where the model is conditioned on a representation of its goal, and given the image features of its current view of the world, and has to take ac...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1804-00168#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1804-00168#decodyngWed, 22 May 2019 06:16:23 +00001905.01067zhou2019deconstructing2Deconstructing Lottery Tickets: Zeros, Signs, and the SupermaskCodyWildThe Lottery Ticket Hypothesis is the idea that you can train a deep network, set all but a small percentage of its high-magnitude weights to zero, and retrain the network using the connection topology of the remaining weights, but only if you re-initialize the unpruned weights to the the values they had at the beginning of the first training. This suggests that part of the value of training such big networks is not that we need that many parameters to use their expressive capacity, but that we n...
http://www.shortscience.org/paper?bibtexKey=zhou2019deconstructing#decodyng
http://www.shortscience.org/paper?bibtexKey=zhou2019deconstructing#decodyngTue, 21 May 2019 06:16:52 +0000conf/emnlp/PenningtonSM142Glove: Global Vectors for Word RepresentationAblaikhan AkhazhanovStanford’s paper on Global Vectors for Word Representation proposes one of few extremely popular word embedding methods in NLP. GloVe takes advantage of both global corpus statistics and local context window methods by constructing the word-context co-occurrence matrix and reducing its dimensionality by preserving as much of the variance as possible. It builds a feature space with additive compositionality while preserving statistically meaningful word occurrence information extracted from the...
http://www.shortscience.org/paper?bibtexKey=conf/emnlp/PenningtonSM14#gkcalat
http://www.shortscience.org/paper?bibtexKey=conf/emnlp/PenningtonSM14#gkcalatMon, 20 May 2019 07:09:34 +0000journals/coling/BergerPP962A Maximum Entropy Approach to Natural Language ProcessingAblaikhan AkhazhanovA fundamental paper by Adam Berger and his colleagues at IBM Watson Research Center presents a thorough guide on how to apply maximum entropy (ME) modelling in NLP. Authors follow the principle of maximum entropy and show the optimality of their procedure as well as its duality relationship with the maximum log-likelihood estimation. Importantly, the paper proposes a method for automatic feature selection, perhaps the most critical step in the entire approach. Empirical results from Candide, an ...
http://www.shortscience.org/paper?bibtexKey=journals/coling/BergerPP96#gkcalat
http://www.shortscience.org/paper?bibtexKey=journals/coling/BergerPP96#gkcalatMon, 20 May 2019 07:07:24 +00001711.05101journals/corr/1711.051012Decoupled Weight Decay RegularizationCodyWildA few years ago, a paper came out demonstrating that adaptive gradient methods (which dynamically scale gradient updates in a per-parameter way according to the magnitudes of past updates) have a tendency to generalize less well than non-adaptive methods, even they adaptive methods sometimes look more performant in training, and are easier to hyperparameter tune. The 2017 paper offered a theoretical explanation for this fact based on Adam learning less complex solutions than SGD; this paper offe...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.05101#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.05101#decodyngMon, 20 May 2019 07:06:04 +0000conf/naacl/McDonaldPRH052Non-Projective Dependency Parsing using Spanning Tree AlgorithmsAblaikhan AkhazhanovThe paper written by an international collaboration between UPenn and CUNI presents an original parsing approach that expands the conventional projective tree-based method to include the non-projective dependencies in text. Authors represent words and their relationships as vertices and edges of a complete directed graph respectively, and then employ Chu-Liu-Edmonds algorithm to find the maximum spanning trees allowing non-projective parses. Importantly, they demonstrated better accuracy for Cze...
http://www.shortscience.org/paper?bibtexKey=conf/naacl/McDonaldPRH05#gkcalat
http://www.shortscience.org/paper?bibtexKey=conf/naacl/McDonaldPRH05#gkcalatMon, 20 May 2019 07:05:22 +0000mikolov2013distributed2Distributed representations of words and phrases and their compositionalityAblaikhan AkhazhanovThe paper presents the famously known Word2Vec model, which became ubiquitous in numerous NLP applications partially owing to its linear feature space with additive compositionality. To be fair, the paper is an extension to the previously presented work by Tomas Mikolov and his colleagues on distributed representation of words and phrases. The proposed approach is based on the skip-gram model and introduces four novel methods to significantly improve training speed and performance. Particularly,...
http://www.shortscience.org/paper?bibtexKey=mikolov2013distributed#gkcalat
http://www.shortscience.org/paper?bibtexKey=mikolov2013distributed#gkcalatMon, 20 May 2019 07:01:35 +0000journals/jmlr/BengioDVJ032A Neural Probabilistic Language ModelAblaikhan AkhazhanovYoshua Bengio’s visionary work on probabilistic language modelling had a huge impact on the field of Natural Language Processing. Despite that it was published nearly 20 years ago, it is still relevant to modern NLP solutions. In fact, the subsequent works perfected the state-of-the-art in NLP, although it took more than 10 years for the paper to get a significant attention in the field.
The authors approach the fundamental problem of exponentially growing number of trainable parameters with ...
http://www.shortscience.org/paper?bibtexKey=journals/jmlr/BengioDVJ03#gkcalat
http://www.shortscience.org/paper?bibtexKey=journals/jmlr/BengioDVJ03#gkcalatMon, 20 May 2019 07:00:25 +00001905.01320journals/corr/1905.013202Meta-learners' learning dynamics are unlike learners'CodyWildMeta learning, or, the idea of training models on some distribution of tasks, with the hope that they can then learn more quickly on new tasks because they have “learned how to learn” similar tasks, has become a more central and popular research field in recent years. Although there is a veritable zoo of different techniques (to an amusingly literal degree; there’s an emergent fad of naming new methods after animals), the general idea is: have your inner loop consist of training a model on...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1905.01320#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1905.01320#decodyngSun, 19 May 2019 05:01:50 +00001905.02249berthelot2019mixmatch2MixMatch: A Holistic Approach to Semi-Supervised LearningCodyWildAs per the “holistic” in the paper title, the goal of this work is to take a suite of existing work within semi-supervised learning, and combine many of its ideas into one training pipeline that can (with really impressive empirical success) leverage the advantages of those different ideas.
The core premise of supervised learning is that, given true-label training signal from a small number of labels, you can leverage large amounts of unsupervised data to improve your model. A central intu...
http://www.shortscience.org/paper?bibtexKey=berthelot2019mixmatch#decodyng
http://www.shortscience.org/paper?bibtexKey=berthelot2019mixmatch#decodyngSat, 18 May 2019 06:45:25 +00001904.01033journals/corr/abs-1904-010332Multitask Soft Option LearningCodyWildThis paper blends concepts from variational inference and hierarchical reinforcement learning, learning skills or “options” out of which master policies can be constructed, in a way that allows for both information transfer across tasks and specialization on any given task.
The idea of hierarchical reinforcement learning is that instead of maintaining one single policy distribution (a learned mapping between world-states and actions), a learning system will maintain multiple simpler polici...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-01033#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-01033#decodyngFri, 17 May 2019 06:03:36 +00001803.08494journals/corr/1803.084942Group NormalizationHadrien BertrandCode:
# Summary
Batch Normalization doesn't work well when using small batch sizes, which is often required for memory intensive tasks such as detection or segmentation, or memory intensive data such as 3D images, videos or high-res images.
Group Normalization is a simple alternative that is independent of the batch size:
![image]()
It works as BN, except with a different set of features for computing the mean and std:
![image]()
The $\gamma$ and $\beta$ are learned per group and applied as...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.08494#hbertrand
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.08494#hbertrandThu, 16 May 2019 20:24:25 +00001810.09536journals/corr/abs-1810-095362Ordered Neurons: Integrating Tree Structures into Recurrent Neural NetworksCodyWildThis paper came on my radar after winning Best Paper recently at ICLR, and all in all I found it a clever way of engineering a somewhat complicated inductive bias into a differentiable structure. The empirical results weren’t compelling enough to suggest that this structural shift made a regime-change difference in performing, but it does seem to have some consistently stronger ability to do syntactic evaluation across large gaps in sentences.
The core premise of this paper is that, while la...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1810-09536#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1810-09536#decodyngWed, 15 May 2019 05:56:21 +00001905.03670journals/corr/1905.036702S$^\mathbf{4}$L: Self-Supervised Semi-Supervised LearningCodyWildIt’s possible I’m missing something here, but my primary response to reading this paper is just a sense of confusion: that there is an implicit presenting of an approach as novel, when there doesn’t seem to me to be a clear mechanism that is changed from prior work. The premise of this paper is that self-supervised learning techniques (a subcategory of unsupervised learning, where losses are constructed based on reconstruction or perturbation of the original image) should be made into supe...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1905.03670#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1905.03670#decodyngTue, 14 May 2019 06:04:06 +00001905.02175ilyas2019adversarial2Adversarial Examples Are Not Bugs, They Are FeaturesCodyWildIt didn’t hit me how much this paper was a pun until I finished it, and in retrospect, I say, bravo.
This paper focuses on adversarial examples, and argues that, at least in some cases, adversarial perturbations aren’t purely overfitting failures on behalf of the model, but actual features that generalize to the test set. This conclusion comes from a set of two experiments:
- In one, the authors create a dataset that only contains what they call “robust features”. They do this by tak...
http://www.shortscience.org/paper?bibtexKey=ilyas2019adversarial#decodyng
http://www.shortscience.org/paper?bibtexKey=ilyas2019adversarial#decodyngMon, 13 May 2019 07:25:29 +00001705.08292journals/corr/WilsonRSSR172The Marginal Value of Adaptive Gradient Methods in Machine LearningCodyWildIn modern machine learning, gradient descent has diversified into a zoo of subtly distinct techniques, all designed, analytically, heuristically, or practically, to ease or accelerate our model’s path through multidimensional loss space. A solid contingent of these methods are Adaptive Gradient methods, which scale the size of gradient updates according to variously calculated historical averages or variances of the vector update, which has the effect of scaling down the updates along feature...
http://www.shortscience.org/paper?bibtexKey=journals/corr/WilsonRSSR17#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/WilsonRSSR17#decodyngSun, 12 May 2019 07:49:43 +00001802.06070journals/corr/abs-1802-060702Diversity is All You Need: Learning Skills without a Reward FunctionCodyWild[I do occasionally wonder if people will look back on the “Is All You Need” with genuine confusion in a few years. “Really…all you need?”]
This paper merges the ideas of curiosity-based learning and hierarchical reinforcement learning, to propose an architecture for learning distinctive skills based solely on an incentive to make those skills distinguishable from one another and relatively internally random, rather than because they’re directly useful in achieving some reward.
The...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-06070#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-06070#decodyngFri, 10 May 2019 07:03:49 +00001810.12894burda2018exploration2Exploration by Random Network DistillationCodyWildReward functions are a funny part of modern reinforcement learning: enormously salient from the inside, if you’re coding or working with RL systems, yet not as clearly visible from the outside perspective, where we just see agents playing games in what seem to be human-like ways. Just seeing things from this angle, it can be easy to imagine that the mechanisms being used to learn are human-like as well. And, it’s true that some of the Atari games being examined are cases where there is in fa...
http://www.shortscience.org/paper?bibtexKey=burda2018exploration#decodyng
http://www.shortscience.org/paper?bibtexKey=burda2018exploration#decodyngThu, 09 May 2019 04:54:32 +00001903.05168journals/corr/abs-1903-051682On the Pitfalls of Measuring Emergent CommunicationCodyWildLanguage seems obviously useful to humans in coordinating on complicated tasks, and, the logic goes, you might expect that if you gave agents in a multi-agent RL system some amount of shared interest, and the capacity to communicate, that they would use that communication channel to coordinate actions. This is particularly true in cases where some part of the environment is only visible to one of the agents. A number of papers in the field have set up such scenarios, and argued that meaningful c...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-05168#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-05168#decodyngWed, 08 May 2019 06:41:35 +00001903.01611journals/corr/abs-1903-016112The Lottery Ticket Hypothesis at ScaleCodyWildIn 2018, a group including many of the authors of this updated paper argued for a theory of deep neural network optimization that they called the “Lottery Ticket Hypothesis”. It framed itself as a solution to what was otherwise a confusing seeming-contradiction: that you could prune or compress trained networks to contain a small percentage of their trained weights without loss of performance, but also that if you tried to train a comparably small network (comparable to the post-training pru...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-01611#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-01611#decodyngTue, 07 May 2019 06:18:29 +00001904.10509journals/corr/abs-1904-105093Generating Long Sequences with Sparse TransformersCodyWildThe Transformer, a somewhat confusingly-named model structure that uses attention mechanisms to aggregate information for understanding or generating data, has been having a real moment in the last year or so, with GPT-2 being only the most well-publicized tip of that iceberg. It has lots of advantages: the obvious attractions of strong performance, as well as the ability to train in parallel across parts of a sequence, which RNNs can’t do because of the need to build up and maintain state. Ho...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-10509#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-10509#decodyngMon, 06 May 2019 06:43:39 +00001904.12848journals/corr/abs-1904-128483Unsupervised Data AugmentationCodyWildThis paper focuses on taking advances from the (mostly heuristic, practical) world of data augmentation for supervised learning, and applying those to the unsupervised setting, as a way of inducing better performance in a semi-supervised environment (with many unlabeled points, and few labeled ones)
Data augmentation has been a mostly behind-the-scenes implementation detail in modern deep learning: minor modifications like shifting a dataset by a few pixels, rotating it slightly, or flipping i...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-12848#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-12848#decodyngSun, 05 May 2019 02:46:39 +00001903.07227journals/corr/abs-1903-072272Counterpoint by ConvolutionCodyWildThe Magenta group at Google is a consistent source of really interesting problems for machine learning to solve, in the vein of creative generation of art and music, as well as mathematically creative ways to solve those problem. In this paper, they tackle a new problem with some interesting model-structural implications: generating Bach chorales composed of polyphonic multi-instrument arrangements. On one layer, this is similar to music generation problems that have been studied before, in tha...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-07227#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-07227#decodyngFri, 03 May 2019 06:22:26 +00001902.10186journals/corr/abs-1902-101862Attention is not ExplanationCodyWildAttention mechanisms are a common subcomponent within language models, initially as a part of recurrent models, and more recently as their own form of aggregating information over sequences, independent from the recurrence structure. Attention works by taking as input some sequence of inputs, in the most typical case embedded representations of words in a sentence, and learning a distribution of weights over those representations, which allows the network to aggregate the representations, typica...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1902-10186#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1902-10186#decodyngThu, 02 May 2019 06:21:07 +0000conf/cvpr/TaigmanYRW143DeepFace: Closing the Gap to Human-Level Performance in Face VerificationMartin Thoma## General stuff about face recognition
Face recognition has 4 main tasks:
* **Face detection**: Given an image, draw a rectangle around every face
* **Face alignment**: Transform a face to be in a canonical pose
* **Face representation**: Find a representation of a face which is suitable for follow-up tasks (small size, computationally cheap to compare, invariant to irrelevant changes)
* **Face verification**: Images of two faces are given. Decide if it is the same person or not.
The face ve...
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/TaigmanYRW14#martinthoma
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/TaigmanYRW14#martinthomaSat, 27 Apr 2019 21:36:31 +0000conf/miccai/WuKWGLS132Unsupervised Deep Feature Learning for Deformable Registration of MR Brain ImagesAnmol SharmaAccurate anatomical landmark correspondence is highly critical for medical image registration. Traditionally many of the previous works proposed a number of hand-crafted feature sets that can be used to perform correspondence. However these feature tend to be highly specialized in terms of application area, and cannot be always generalized well to other applications without significant modifications. There have been other works that perform automatic feature extraction, but their reliance on lab...
http://www.shortscience.org/paper?bibtexKey=conf/miccai/WuKWGLS13#anmolsharma
http://www.shortscience.org/paper?bibtexKey=conf/miccai/WuKWGLS13#anmolsharmaThu, 18 Apr 2019 22:24:31 +0000journals/tmi/PereiraPAS162Brain Tumor Segmentation Using Convolutional Neural Networks in MRI ImagesAnmol SharmaTumor segmentation from brain MRI sequences is usually
done manually by the radiologist. Being a highly tedious and error prone task, mainly due to factors such as human fatigue, overabundance of MRI slices per patient, and increasing number of patients, manual operations often lead to inaccurate delineation. Moreover, use of qualitative measures of evaluation by radiologists results in high inter- and intraobserver
error rates. There is an evident need for automated systems to perform this task...
http://www.shortscience.org/paper?bibtexKey=journals/tmi/PereiraPAS16#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/tmi/PereiraPAS16#anmolsharmaThu, 18 Apr 2019 22:23:42 +0000journals/neuroimage/SukLS142Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosisAnmol SharmaAlzheimer's Disease (AD) is characterized by impairment of cognitive and memory function, mostly leading to dementia in elderly subjects. For the last decade, it has been shown that neuroimaging can be a potential tool for the diagnosis of Alzheimer's Disease (AD) and its prodromal stage, Mild Cognitive Impairment (MCI), and also fusion of different modalities can further provide the complementary information to enhance diagnostic accuracy. Multimodal information like that from MRI and PET can b...
http://www.shortscience.org/paper?bibtexKey=journals/neuroimage/SukLS14#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/neuroimage/SukLS14#anmolsharmaThu, 18 Apr 2019 22:22:46 +00001606.04797journals/corr/MilletariNA162V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image SegmentationAnmol SharmaMedical image segmentation have been a classic problem in medical image analysis, with a score of research backing the problem. Many approaches worked by designing hand-crafted features, while others worked using global or local intensity cues. These approaches were sometimes extended to 3D, but most of the algorithms work with 2D images (or 2D slices of a 3D image). It is hypothesized that using the full 3D volume of a scan may improve segmentation performance due to the amount of context that ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/MilletariNA16#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/corr/MilletariNA16#anmolsharmaThu, 18 Apr 2019 22:22:05 +0000journals/nature/EstevaKNKSBT172Dermatologist-level classification of skin cancer with deep neural networksAnmol SharmaSkin cancer is one of the most common cancer type in humans. Primarily, the lesion is diagnosed visually through a series of 2D color images taken of the affected area. This may be followed by dermoscopic analysis, a biopsy and histopathological examination. Automated classification of skin lesions using images is a challenging task owing to the fine-grained variability in the appearance of skin lesions.
To this end, Esteva et al. propose a deep learning based solution to automate the task of ...
http://www.shortscience.org/paper?bibtexKey=journals/nature/EstevaKNKSBT17#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/nature/EstevaKNKSBT17#anmolsharmaThu, 18 Apr 2019 22:21:26 +0000journals/pami/HuangPM062Shape Registration in Implicit Spaces Using Information Theory and Free Form DeformationsAnmol SharmaShape registration problem have been an active research topic in computational geometry, computer vision, medical image analysis and pattern recognition communities. Also called the shape alignment, it has extensive uses in recognition, indexing, retrieval, generation and other downstream analysis of a set of shapes. There have been a variety of works that approach this problem, with the methods varying mostly in terms of (can be called pillars of registration) the shape representation, transfor...
http://www.shortscience.org/paper?bibtexKey=journals/pami/HuangPM06#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/pami/HuangPM06#anmolsharmaThu, 18 Apr 2019 22:20:35 +0000journals/pami/JianV112Robust Point Set Registration Using Gaussian Mixture ModelsAnmol SharmaPoint pattern matching problem have been an active research topic in computation geometry and pattern recognition communities. These point sets typically arise in a variety of applications, where the problem lies in registration of these point sets which is encountered in stereo matching, feature based image registration and so on. Mathematically, the problem of registering two point sets translates to the following: Let $\{\mathcal{M}, \mathcal{S}\}$ be two finite set points which need to be re...
http://www.shortscience.org/paper?bibtexKey=journals/pami/JianV11#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/pami/JianV11#anmolsharmaThu, 18 Apr 2019 22:08:30 +000010.1109/42.7962842Nonrigid registration using free-form deformations: application to breast MR imagesAnmol SharmaDespite being an ill-posed problem, non-rigid image registration has been the subject of numerous works, which apply the framework on different applications where rigid and affine transformations cannot completely model the variations between image sets. One such application of non-rigid registration is to register pre- and post-contrast breast MR images for estimating contrast uptake, which in turn is an indicator of the tumor malignancy. Due to large variations between the pre- and post-contra...
http://www.shortscience.org/paper?bibtexKey=10.1109/42.796284#anmolsharma
http://www.shortscience.org/paper?bibtexKey=10.1109/42.796284#anmolsharmaThu, 18 Apr 2019 22:04:39 +0000conf/miccai/TangC072Non-rigid Image Registration Using Graph-cutsAnmol SharmaImage registration has been well studied problem in medical image analysis community, with rigid registration taking much of the spotlight. In addition to rigid registration, non-rigid registration is of great interest due to it's applications in inter-patient modality registration where deformations of organs are highly pronounced. However non-rigid registration is an ill-posed problem with numerous degrees of freedom, which makes finding the best transformation from source to final image very ...
http://www.shortscience.org/paper?bibtexKey=conf/miccai/TangC07#anmolsharma
http://www.shortscience.org/paper?bibtexKey=conf/miccai/TangC07#anmolsharmaThu, 18 Apr 2019 22:03:24 +0000journals/pieee/MaesVS032Medical image registration using mutual informationAnmol SharmaCurrent medical imaging modalities like Computed Tomography (CT), Magnetic Resonance Imaging (MRI) or Positron Emission Tomography (PET) has allowed minimally-invasive imaging of internal organs. Rapid advancement in these technologies have lead to an influx of data, which, along with rising clinical need, has lead towards a need for quantitative image interpretation in routine practice. Some of the applications include volumetric measurements of regions of the brain, surgery or radiotherapy pla...
http://www.shortscience.org/paper?bibtexKey=journals/pieee/MaesVS03#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/pieee/MaesVS03#anmolsharmaThu, 18 Apr 2019 22:02:30 +0000journals/tmi/DaviesTCWT022A minimum description length approach to statistical shape modelingAnmol SharmaActive Shape Models brought with them the ability to intelligentally deform to various intra-shape variations according to a labelled training set of landmark points. However the dependence of such methods on a low-noise training set marked manually poses challenges due to inter-observer differences which becomes even more pronounced in higher-dimensions (3D). To this end, the authors propose a method that addresses this problem, but introducing automatic shape modelling.
The method is based u...
http://www.shortscience.org/paper?bibtexKey=journals/tmi/DaviesTCWT02#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/tmi/DaviesTCWT02#anmolsharmaThu, 18 Apr 2019 22:00:54 +0000conf/miccai/FrangiNVV982Muliscale Vessel Enhancement FilteringAnmol SharmaDelineation of vessel structures in human vasculature forms the precursor to a number of clinical applications. Typically, the delineation is performed using both 2D (DSA) and 3D techniques (CT, MR, XRay Angiography). However the decisions are still made using a maximum intensity projection (MIP) of the data. This is problematic since MIP is also affected by other tissues of high intensity, and low intensity vasculature may never be fully realized in the MIP compared to other tissues. This calls...
http://www.shortscience.org/paper?bibtexKey=conf/miccai/FrangiNVV98#anmolsharma
http://www.shortscience.org/paper?bibtexKey=conf/miccai/FrangiNVV98#anmolsharmaThu, 18 Apr 2019 21:59:41 +0000journals/cviu/CootesTCG952Active Shape Models-Their Training and ApplicationAnmol SharmaObject detection in 2D scenes have mostly been performed using model-based approaches, which model the appearance of certain objects of interest. Although such approaches tend to work well in cluttered, noisy and occluded settings, the failure of such models to adapt to intra-object variability that is apparent in many domains like medical imaging, where the organ shapes tend to vary a lot, have lead to a need for a more robust approach. To this end, Cootes et al. propose a training based metho...
http://www.shortscience.org/paper?bibtexKey=journals/cviu/CootesTCG95#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/cviu/CootesTCG95#anmolsharmaThu, 18 Apr 2019 21:58:03 +0000journals/pami/Grady062Random Walks for Image SegmentationAnmol SharmaImage segmentation have been a topic of research in computer vision domain for decades. There have been a multitude of methods proposed for segmentation, but most have been dependent on a high level user input which guides the contour or boundaries towards the real boundaries. In order to come close to a fully automated or partially automated solution,
a novel method is proposed for performing multilabel,
interactive image segmentation using Random Walk algorithm as the fundamental driver of se...
http://www.shortscience.org/paper?bibtexKey=journals/pami/Grady06#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/pami/Grady06#anmolsharmaThu, 18 Apr 2019 21:54:32 +0000journals/ijcv/BoykovF062Graph Cuts and Efficient N-D Image SegmentationAnmol SharmaOver the last decade and a half, a plethora of image segmentation algorithms have been proposed, which can be categorized into belonging to roughly four categories, represented by a combination of two labels: explicit or implicit boundary representation, and variational or combinatorial methods. While classic methods like Snakes [1] and Level-Sets [2] belong to explicit/variational and implicit/variational category, there have been another set of algorithms falling under the combinatorial domain...
http://www.shortscience.org/paper?bibtexKey=journals/ijcv/BoykovF06#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/ijcv/BoykovF06#anmolsharmaThu, 18 Apr 2019 21:53:57 +0000journals/mia/BarrettM972Interactive live-wire boundary extractionAnmol SharmaEdge, contour or boundary detection in 2D images have been an area of active research, with a variety of different algorithms. However due to a wide variety of image types and content, developing automatic segmentation algorithms have been challenging, while manual segmentation is tedious and time consuming. Previous algorithms approaching this task have tried to incorporate higher level constraints, energy functional (snakes), global properties (graph based). However the approaches still do ...
http://www.shortscience.org/paper?bibtexKey=journals/mia/BarrettM97#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/mia/BarrettM97#anmolsharmaThu, 18 Apr 2019 21:53:16 +0000journals/tmi/PluempitiwiriyawejMWH052STACS: new active contour scheme for cardiac MR image segmentationAnmol SharmaAutomated segmentation of various anatomical structures of interest from medical images has been a well grounded field of research in medical imaging. One such problem is related to segmenting whole heart region from a sequence of magnetic resonance imaging (MRI), which is currently done manually, and is time consuming and tedious. Although many automated techniques exist for this, the task remains challenging due to the complex nature of the problem, partly because of low contrast between heart...
http://www.shortscience.org/paper?bibtexKey=journals/tmi/PluempitiwiriyawejMWH05#anmolsharma
http://www.shortscience.org/paper?bibtexKey=journals/tmi/PluempitiwiriyawejMWH05#anmolsharmaThu, 18 Apr 2019 21:51:56 +000010.1007/bf001335702Snakes: Active contour modelsAnmol SharmaLow level tasks such as edge, contour and line detection are an essential precursor to any downstream image analysis processes. However, most of the approaches targeting these problems work as isolated and autonomous entities, without using any high-level image information such as context, global shapes, or user-level input. This leads to errors that can further propagate through the pipeline without providing an opportunity for future correction. In order to address this problem, Kass et al. in...
http://www.shortscience.org/paper?bibtexKey=10.1007/bf00133570#anmolsharma
http://www.shortscience.org/paper?bibtexKey=10.1007/bf00133570#anmolsharmaThu, 18 Apr 2019 21:42:18 +000010.1109/83.9022912Active contours without edgesAnmol SharmaTypically, the energy minimization or snakes based object detection frameworks evolve a parametrized curve guided by some form of image gradient information. However due to heavy reliance on gradients, the approaches tend to fail in scenarios where this information is misleading or unavailable. This cripples the snake and renders it unusable as it gets stuck in a local-minima away from the actual object. Moreover, the parametrized snake lacks the ability to model multiple evolving curves in a si...
http://www.shortscience.org/paper?bibtexKey=10.1109/83.902291#anmolsharma
http://www.shortscience.org/paper?bibtexKey=10.1109/83.902291#anmolsharmaThu, 18 Apr 2019 21:41:40 +0000conf/icml/KohL173Understanding Black-box Predictions via Influence Functionskangcheng**Goal**: identifying training points most responsible for a given prediction.
Given training points $z_1, \dots, z_n$, let loss function be $\frac{1}{n}\sum_{i=1}^nL(z_i, \theta)$
A function called influence function let us compute the parameter change if $z$ were upweighted by some small $\epsilon$.
$$\hat{\theta}_{\epsilon, z} := \arg \min_{\theta \in \Theta} \frac{1}{n}\sum_{i=1}^n L(z_i, \theta) + \epsilon L(z, \theta)$$
$$\mathcal{I}_{\text{up, params}}(z) := \frac{d\hat{\theta}_{\eps...
http://www.shortscience.org/paper?bibtexKey=conf/icml/KohL17#kangchenghou
http://www.shortscience.org/paper?bibtexKey=conf/icml/KohL17#kangchenghouMon, 15 Apr 2019 15:44:30 +00001611.08036journals/corr/KumraK162Robotic Grasp Detection using Deep Convolutional Neural NetworksANIRUDH NJ# **Introduction**
### **Goal of the paper**
* The goal of this paper is to use an RGB-D image to find the best pose for grasping an object using a parallel pose gripper.
* The goal of this algorithm is to also give an open loop method for manipulation of the object using vision data.
### **Previous Research**
* Even the state of the art in grasp detection algorithms fail under real world circumstances and cannot work in real time.
* To perform grasping a 7D grasp representation is used. But...
http://www.shortscience.org/paper?bibtexKey=journals/corr/KumraK16#anirudhnj
http://www.shortscience.org/paper?bibtexKey=journals/corr/KumraK16#anirudhnjThu, 11 Apr 2019 16:21:29 +00001711.02827journals/corr/1711.028272Inverse Reward DesigncapybaraletThe method they use basically tells the robot to reason as follows:
1. The human gave me a reward function $\tilde{r}$, selected in order to get me to behave the way they wanted.
2. So I should favor reward functions which produce that kind of behavior.
This amounts to doing RL (step 1) followed by IRL on the learned policy (step 2); see the final paragraph of section 4.
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.02827#capybaralet
http://www.shortscience.org/paper?bibtexKey=journals/corr/1711.02827#capybaraletTue, 19 Mar 2019 23:02:23 +00001903.00374journals/corr/1903.003743Model-Based Reinforcement Learning for AtariAnkesh AnandThis paper shows exciting results on using Model-based RL for Atari.
Model-based RL has shown impressive improvements in sample efficiency on Mujoco tasks ([Chua et. al, 2018]()), so its nice to see that the sample efficiency improvements carry over to Pixel-based envs like Atari too.
Specifically, the authors show that their model-based method can do well on several Atari games after training on only 100K env steps (400K frames with FrameSkip 4) which roughly corresponds to 2 hours of game ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1903.00374#ankeshanand
http://www.shortscience.org/paper?bibtexKey=journals/corr/1903.00374#ankeshanandTue, 05 Mar 2019 17:25:09 +00001902.08605journals/corr/1902.086052Centroid Networks for Few-Shot Clustering and Unsupervised Few-Shot ClassificationgabrielDisclaimer: I am the first author.
# Executive summary
- The authors propose a new method, [*Centroid Networks*](), for learning to cluster.
- Given example clusterings of data, the goal is to learn how to cluster new data following the same criterion.
- Centroid Networks basically consist of running K-means on Prototypical Network features, plus many tricks.
- They evaluate Centroid Networks on Omniglot and miniImageNet (supervised few-shot classification benchmarks).
- Centroid Networks can...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1902.08605#gabriel
http://www.shortscience.org/paper?bibtexKey=journals/corr/1902.08605#gabrielWed, 27 Feb 2019 21:12:14 +00001706.03922journals/corr/WangJC172Analyzing the Robustness of Nearest Neighbors to Adversarial ExamplesDavid StutzWang et al. discuss the robustness of $k$-nearest neighbors against adversarial perturbations, providing both a theoretical analysis as well as a robust 1-nearest neighbor version. Specifically, for low $k$ it is shown that nearest neighbor is usually not robust. Here, robustness is judged in a distributional sense; so for fixed and low $k$, the lowest distance of any training sample to an adversarial sample tends to zero, even if the training set size increases. For $k \in \mathcal{O}(dn \log n...
http://www.shortscience.org/paper?bibtexKey=journals/corr/WangJC17#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/WangJC17#davidstutzSat, 16 Feb 2019 18:27:13 +0000conf/cvpr/SharifBR182On the Suitability of Lp-Norms for Creating and Preventing Adversarial ExamplesDavid StutzSharif et al. study the effectiveness of $L_p$ norms for creating adversarial perturbations. In this context, their main discussion revolves around whether $L_p$ norms are sufficient and/or necessary for perceptual similarity. Their main conclusion is that $L_p$ norms are neither necessary nor sufficient to ensure perceptual similarity. For example, an adversarial example might be within a specific $L_p$ bal, but humans might still identify it as not similar enough to the originally attacked sam...
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/SharifBR18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/SharifBR18#davidstutzSat, 16 Feb 2019 18:17:41 +0000conf/nips/RatnerEHDR173Learning to Compose Domain-Specific Transformations for Data Augmentation.David StutzRatner et al. Train an adversarial generative network to learn domain-specific sequences of transformations useful for data augmentation. In particular, as indicated in Figure 1, the generator learns to predict sequences of user-specified transformations and the classifier is intended to distinguish the original images from the transformed ones. For training, the authors use reinforcement learning, because the transformations are not necessarily differentiable – which makes usage of the propos...
http://www.shortscience.org/paper?bibtexKey=conf/nips/RatnerEHDR17#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/RatnerEHDR17#davidstutzSat, 16 Feb 2019 18:08:46 +00001803.04765journals/corr/1803.047653Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep LearningDavid StutzPapernot and McDaniel introduce deep k-nearest neighbors where nearest neighbors are found at each intermediate layer in order to improve interpretbaility and robustness. Personally, I really appreciated reading this paper; thus, I will not only discuss the actually proposed method but also highlight some ideas from their thorough survey and experimental results.
First, Papernot and McDaniel provide a quite thorough survey of relevant work in three disciplines: confidence, interpretability and ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.04765#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.04765#davidstutzSat, 16 Feb 2019 18:05:33 +00001801.04693journals/corr/1801.046933Towards Imperceptible and Robust Adversarial Example Attacks against Neural NetworksDavid StutzLuo et al. Propose a method to compute less-perceptible adversarial examples compared to standard methods constrained in $L_p$ norms. In particular, they consider the local variation of the image and argue that humans are more likely to notice larger variations in low-variance regions than vice-versa. The sensitivity of a pixel is therefore defined as one over its local variance, meaning that it is more sensitive to perturbations. They propose a simple algorithm which iteratively sorts pixels by...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.04693#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1801.04693#davidstutzSat, 16 Feb 2019 17:36:45 +00001706.04599journals/corr/1706.045993On Calibration of Modern Neural NetworksDavid StutzGuo et al. study calibration of deep neural networks as post-processing step. Here, calibration means a correction of the predicted confidence scores as these are commonlz too overconfident in recent deep networks. They consider several state-of-the-art post-processing steps for calibration, but surprisingly, they show that a simple linear mapping, or even scaling, works surprisingly well. So if $z_i$ are the logits of the network, then (the network being fixed) a parameter $T$ is found such tha...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.04599#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.04599#davidstutzSat, 16 Feb 2019 17:30:04 +00001710.10547journals/corr/abs-1710-105472Interpretation of Neural Networks is FragileDavid StutzGhorbani et al. Show that neural network visualization techniques, often introduced to improve interpretability, are susceptible to adversarial examples. For example, they consider common feature-importance visualization techniques and aim to find an advesarial example that does not change the predicted label but the original interpretation – e.g., as measured on some of the most important features. Examples of the so-called top-1000 attack where the 1000 most important features are changed du...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1710-10547#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1710-10547#davidstutzSat, 16 Feb 2019 17:15:35 +00001801.02612journals/corr/abs-1801-026122Spatially Transformed Adversarial ExamplesDavid StutzXiao et al. propose adversarial examples based on spatial transformations. Actually, this work is very similar to the adversarial deformations of [1]. In particular, a deformation flow field is optimized (allowing individual deformations per pixel) to cause a misclassification. The distance of the perturbation is computed on the flow field directly. Examples on MNIST are shown in Figure 1 – it can clearly be seen that most pixels are moved individually and no kind of smoothness is enforced. Th...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1801-02612#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1801-02612#davidstutzMon, 11 Feb 2019 18:37:03 +00001710.10733journals/corr/1710.107332Attacking the Madry Defense Model with $L_1$-based Adversarial ExamplesDavid StutzSharma and Chen provide an experimental comparison of different state-of-the-art attacks against the adversarial training defense by Madry et al. [1]. They consider several attacks, including the Carlini Wagner attacks [2], elastic net attacks [3] as well as projected gradient descent [1]. Their experimental finding – that the defense by Madry et al. Can be broken by increasing the allowed perturbation size (i.e., epsilon) – should not be surprising. Every network trained adversarially will ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10733#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1710.10733#davidstutzMon, 11 Feb 2019 18:26:32 +00001802.06627journals/corr/abs-1802-066272Robustness of Rotation-Equivariant Networks to Adversarial PerturbationsDavid StutzDumont et al. Compare different adversarial transformation attacks (including rotations and translations) against common as well as rotation-invariant convolutional neural networks. On MNIST, CIFAR-10 and ImageNet, they consider translations, rotations as well as the attack of [1] based on spatial transformer networks. Additionally, they consider rotation-invariant convolutional neural networks – however, both the attacks and the networks are not discussed/introduced in detail. The results ar...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-06627#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-06627#davidstutzMon, 11 Feb 2019 18:11:30 +00001711.09115journals/corr/abs-1711-091152Geometric robustness of deep networks: analysis and improvementDavid StutzKanbak et al. propose ManiFool, a method to determine a network’s invariance to transformations by iteratively finding adversarial transformations. In particular, given a class of transformations to consider, ManiFool iteratively alternates two steps. First, a gradient step is taken in order to move into an adversarial direction; then, the obtained perturbation/direction is projected back to the space of allowed transformations. While the details are slightly more involved, I found that this a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1711-09115#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1711-09115#davidstutzMon, 11 Feb 2019 18:06:41 +00001712.09665journals/corr/abs-1712-096652Adversarial PatchDavid StutzBrown et al. Introduce a universal adversarial patch that, when added to an image, will cause a targeted misclassification. The concept is illustrated in Figure 1; essentially, a “sticker” is computed that, when placed randomly on an image, causes misclassification. In practice, the objective function optimized can be written as
$\max_p \mathbb{E}_{x\sim X, t \sim T, l \sim L} \log p(y|A(p,x,l,t))$
where $y$ is the target label and $X$, $T$ and $L$ are te data space, the transformation spa...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1712-09665#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1712-09665#davidstutzMon, 11 Feb 2019 18:03:29 +00001802.00420journals/corr/abs-1802-004203Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial ExamplesDavid StutzAthalye et al. propose methods to circumvent different types of defenses against adversarial example based on obfuscated gradients. In particular, they identify three types of obfuscated gradients: shattered gradients (e.g., caused by undifferentiable parts of a network or through numerical instability), stochastic gradients, and exploding and vanishing gradients. These phenomena all influence the effectiveness of gradient-based attacks. Athalye et al. Give several indicators of how to find out ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-00420#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-00420#davidstutzMon, 11 Feb 2019 17:56:52 +00001804.07729journals/corr/abs-1804-077292ADef: an Iterative Algorithm to Construct Adversarial DeformationsDavid StutzAlaifari et al. propose an iterative attack to construct adversarial deformations of images. In particular, and in contrast to general adversarial perturbations, adversarial deformations are described through a deformation vector field – and the corresponding norm of this vector field may be bounded; an illustration can be found in Figure 1. The adversarial deformation is computed iteratively where the deformation itself is expressed in a differentiable manner. In contrast to very simple trans...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1804-07729#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1804-07729#davidstutzMon, 11 Feb 2019 17:51:52 +00001806.11146journals/corr/abs-1806-111462Adversarial Reprogramming of Neural NetworksDavid StutzElsayed et al. use universal adversarial examples to reprogram neural networks in order to perform different tasks. In particular, e.g., on ImageNet, an adversarial example
$\delta = \tanh(W \cdot M)$
is computed where $M$ is a mask image (see Figure 1, in the paper the mask image essentially embeds a smaller image into an ImageNet-sized image) and $W$ is the adversarial perturbation itself (note that the notaiton was changed slightly for simplification). The hyperbolic tangent constraints the...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1806-11146#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1806-11146#davidstutzSun, 10 Feb 2019 18:57:07 +00001804.03286journals/corr/abs-1804-032862On the Robustness of the CVPR 2018 White-Box Adversarial Example DefensesDavid StutzAthalye and Carlini present experiments showing that pixel deflection [1] and high-level guided denoiser [2] are ineffective as defense against adversarial examples. In particular, they show that these defenses are not effective against the (currently) strongest first-order attack, projected gradient descent. Here, they also comment on the right threat model to use and explicitly state that the attacker would know the employed defense – which intuitively makes much sense when evaluating defens...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1804-03286#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1804-03286#davidstutzSun, 10 Feb 2019 18:44:29 +0000conf/nips/TeoGRS073Convex Learning with InvariancesDavid StutzTeo et al. propose a convex, robust learning framework allowing to integrate invariances into SVM training. In particular, they consider a set of valid transformations and define the cost of a training sample (i.e., pair of data and label) as the loss under the worst case transformation – this definition is very similar to robust optimization or adversarial training. Then, a convex upper bound on this cost is derived. Given, that the worst case transformation can be found efficiently, two diff...
http://www.shortscience.org/paper?bibtexKey=conf/nips/TeoGRS07#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/TeoGRS07#davidstutzSun, 10 Feb 2019 18:36:31 +0000conf/cvpr/DongLPS0HL182Boosting Adversarial Attacks With MomentumDavid StutzDong et al. introduce momentum into iterative white-box adversarial examples and also show that attacking ensembles of models improves transferability. Specifically, their contribution is twofold. First, some iterative white-box attacks are extended to include a momentum term. As in optimization or learning, the main motivation is to avoid local maxima and have faster convergence. In experiments, they show that momentum is able to increase the success rates of attacks.
Second, to improve the tr...
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/DongLPS0HL18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/DongLPS0HL18#davidstutzSun, 10 Feb 2019 18:03:00 +00001711.00851journals/corr/abs-1711-008512Provable defenses against adversarial examples via the convex outer adversarial polytopeDavid StutzWong and Kolter propose a method for learning provably-robust, deep, ReLU based networks by considering the so-called adversarial polytope of final-layer activations reachable through adversarial examples. Overall, the proposed approach has some similarities to adversarial training in that the overall objective can be written as
$\min_\theta \sum_{i = 1}^N \max_{\|\Delta\|_\infty \leq \epsilon} L(f_\theta(x_i + \Delta), y_i)$.
However, in contrast to previous work, the inner maximization prob...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1711-00851#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1711-00851#davidstutzSun, 10 Feb 2019 17:56:35 +00001805.12152journals/corr/abs-1805-121522There Is No Free Lunch In Adversarial Robustness (But There Are Unexpected Benefits)David StutzTsipras et al. investigate the trade-off between classification accuracy and adversarial robustness. In particular, on a very simple toy dataset, they proof that such a trade-off exists; this means that very accurate models will also have low robustness. Overall, on this dataset, they find that there exists a sweet-spot where the accuracy is 70% and the adversarial accuracy (i.e., accuracy on adversarial examples) is 70%. Using adversarial training to obtain robust networks, they additionally sh...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1805-12152#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1805-12152#davidstutzSun, 10 Feb 2019 17:45:22 +0000conf/nips/SchmidtSTTM184Adversarially Robust Generalization Requires More DataDavid StutzSchmidt et al. theoretically and experimentally show that training adversarially robust models requires a higher sample complexity compared to regular generalization. Theoretically, they analyze two very simple families of datasets, e.g., consisting of two Gaussian distributions corresponding to a two-class problem. On such datasets, they proof that “robust generalization”, i.e., generalization to adversarial examples, required much higher sample complexity compared to regular generlization,...
http://www.shortscience.org/paper?bibtexKey=conf/nips/SchmidtSTTM18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/SchmidtSTTM18#davidstutzSun, 10 Feb 2019 17:37:53 +00001803.00940journals/corr/abs-1803-009402Protecting JPEG Images Against Adversarial AttacksDavid StutzMotivated by JPEG compression, Prakash et al. propose an adaptive quantization scheme as defense against adversarial attacks. They argue that JPEG experimentally reduces adversarial noise; however, it is difficult to automatically decide on the level of compression as it also influences a classifier’s performance. Therefore, Prakash et al. use a saliency detector to identify background region, and then apply adaptive quantization – with coarser detail at the background – to reduce the impa...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1803-00940#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1803-00940#davidstutzSun, 10 Feb 2019 17:30:41 +00001803.06373journals/corr/abs-1803-063733Adversarial Logit PairingDavid StutzKannan et al. propose a defense against adversarial examples called adversarial logit pairing where the logits of clean and adversarial example are regularized to be similar. In particular, during adversarial training, they add a regularizer of the form
$\lambda L(f(x), f(x’))$
were $L$ is, for example, the $L_2$ norm and $f(x’)$ the logits corresponding to adversarial example $x’$ (corresponding to clean example $x$). Intuitively, this is a very simple approach – adversarial training ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1803-06373#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1803-06373#davidstutzSun, 10 Feb 2019 17:25:06 +00001802.07124journals/corr/abs-1802-071244Out-distribution training confers robustness to deep neural networksDavid Stutz...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-07124#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-07124#davidstutzSun, 10 Feb 2019 17:20:43 +0000conf/cvpr/AkhtarLM182Defense Against Universal Adversarial PerturbationsDavid StutzAkhtar et al. Propose a rectification and detection scheme as defense against universal adversarial perturbations. Their overall approach is illustrated in Figure 1 an briefly summarized as follows. Given a classifier with fixed weights, a rectification network (the so-called perturbation rectifying network – PRN) is trained in order to “undo” the perturbations. This network can be trained on a set of clean and perturbed images using the classifier’s loss. Second, based on the discrete c...
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/AkhtarLM18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/cvpr/AkhtarLM18#davidstutzSun, 10 Feb 2019 16:56:11 +00001803.07994journals/corr/abs-1803-079942Adversarial Defense based on Structure-to-Signal AutoencodersDavid StutzFolz et al. propose an auto-encoder based defense against adversarial examples. In particular, they propose structure-to-signal auto-encoders, S2SNets, as defense mechanism – this auto-encoder is first trained in an unsupervised fashion to reconstruct images (which can be done independent of attack models or the classification network under attack). Then, the network’s decoder is fine tuned using gradients from the classification network. Their main argumentation is that the gradients of the...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1803-07994#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1803-07994#davidstutzSun, 10 Feb 2019 16:50:03 +00001812.09916journals/corr/1812.099163Improving MMD-GAN Training with Repulsive Loss Functionrichard_wth**TL;DR**: Rearranging the terms in Maximum Mean Discrepancy yields a much better loss function for the discriminator of Generative Adversarial Nets.
**Keywords**: Generative adversarial nets, Maximum Mean Discrepancy, spectral normalization, convolutional neural networks, Gaussian kernel, local stability.
**Summary**
Generative adversarial nets (GANs) are widely used to learn the data sampling process and are notoriously difficult to train. The training of GANs may be improved from three asp...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1812.09916#richardwth
http://www.shortscience.org/paper?bibtexKey=journals/corr/1812.09916#richardwthTue, 15 Jan 2019 05:07:15 +00001802.03685journals/corr/abs-1802-036852Learning a SAT Solver from Single-Bit SupervisionameroyerThe goal is to solve SAT problems with weak supervision: In that case a model is trained only to predict ***the satisfiability*** of a formula in conjunctive normal form. As a byproduct, when the formula is satisfiable, an actual satisfying assignment can be worked out by clustering the network's activations in most cases.
* **Pros (+):** Weak supervision, interesting structured architecture, seems to generalize nicely to harder problems by increasing the number message passing iterations.
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-03685#ameroyer
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-03685#ameroyerMon, 14 Jan 2019 12:58:19 +00001809.01442journals/corr/1809.014423Data Augmentation for Skin Lesion AnalysisFábio Perez_Disclaimer: I'm the first author of this paper._
The code for this paper can be found at .
In this work, we wanted to compare different data augmentation scenarios for skin lesion analysis. We tried 13 scenarios, including commonly used augmentation techniques (color and geometry transformations), unusual ones (random erasing, elastic transformation, and a novel lesion mix to simulate collision lesions), and a combination of those.
Examples of the augmentation scenarios:
a) no augmentati...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.01442#fabioperez
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.01442#fabioperezMon, 14 Jan 2019 10:23:45 +00001802.10217journals/corr/1802.102174Investigating Human Priors for Playing Video GamesFábio PerezAuthors investigated why humans play some video games better than machines. That is the case for games that do not have continuous rewards (e.g., scores). They experimented with a game -- inspired by _Montezuma's Revenge_ -- in which the player has to climb stairs, collect keys and jump over enemies. RL algorithms can only know if they succeed if they finish the game, as there is no rewards during the gameplay, so they tend to do much worse than humans in these games.
To compare between humans ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.10217#fabioperez
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.10217#fabioperezFri, 28 Dec 2018 20:19:27 +00001710.10196journals/corr/abs-1710-101963Progressive Growing of GANs for Improved Quality, Stability, and VariationANIRUDH NJ
## **Keywords**
Progressive GAN , High resolution generator
---
## **Summary**
1. **Introduction**
1. **Goal of the paper**
1. Generation of very high quality images using progressively increasing size of the generator and discriminator.
1. Improved training and stability of GANs.
1. New metric for evaluating GAN results.
1. A high quality version of CELEBA-HQ dataset.
1. **Previous Research**
1. Generative methods help to produce new s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1710-10196#anirudhnj
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1710-10196#anirudhnjFri, 28 Dec 2018 18:33:17 +00001810.09136journals/corr/1810.091364Do Deep Generative Models Know What They Don't Know?ameroyerCNNs predictions are known to be very sensitive to adversarial examples, which are samples generated to be wrongly classifiied with high confidence. On the other hand, probabilistic generative models such as `PixelCNN` and `VAEs` learn a distribution over the input domain hence could be used to detect ***out-of-distribution inputs***, e.g., by estimating their likelihood under the data distribution. This paper provides interesting results showing that distributions learned by generative models a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.09136#ameroyer
http://www.shortscience.org/paper?bibtexKey=journals/corr/1810.09136#ameroyerMon, 17 Dec 2018 10:20:46 +00001806.07366journals/corr/1806.073662Neural Ordinary Differential EquationswassnameSummary by senior author [duvenaud on hackernews]().
A few years ago, everyone switched their deep nets to "residual nets". Instead of building deep models like this:
h1 = f1(x)
h2 = f2(h1)
h3 = f3(h2)
h4 = f3(h3)
y = f5(h4)
They now build them like this:
h1 = f1(x) + x
h2 = f2(h1) + h1
h3 = f3(h2) + h2
h4 = f4(h3) + h3
y = f5(h4) + h4
Where f1, f2, etc are neural net layers. The idea is that it's easier to model a small change to an almost-correc...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07366#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1806.07366#wassnameSun, 16 Dec 2018 04:33:03 +00001802.04865journals/corr/1802.048652Learning Confidence for Out-of-Distribution Detection in Neural Networkselbaro
## Summary
In a prior work 'On Calibration of Modern Nueral Networks', temperature scailing is used for outputing confidence. This is done at inference stage, and does not change the existing classifier. This paper considers the confidence at training stage, and directly outputs the confidence from the network.
## Architecture
An additional branch for confidence is added after the penultimate layer, in parallel to logits and probs (Figure 2).
## Training
The network outputs the prob $p$ and...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.04865#elbaro
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.04865#elbaroMon, 10 Dec 2018 07:30:06 +00001706.02690journals/corr/1706.026902Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networkselbaro## Task
Add '**rejection**' output to an existing classification model with softmax layer.
## Method
1. Choose some threshold $\delta$ and temperature $T$
2. Add a perturbation to the input x (eq 2),
let $\tilde x = x - \epsilon \text{sign}(-\nabla_x \log S_{\hat y}(x;T))$
3. If $p(\tilde x;T)\le \delta$, rejects
4. If not, return the output of the original classifier
$p(\tilde x;T)$ is the max prob with temperature scailing for input $\tilde x$
$\delta$ and $T$ are manually chosen.
...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#elbaro
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.02690#elbaroMon, 10 Dec 2018 07:17:15 +00001706.04599journals/corr/1706.045992On Calibration of Modern Neural Networkselbaro## Task
A neural network for classification typically has a **softmax** layer and outputs the class with the max probability. However, this probability does not represent the **confidence**. If the average confidence (average of max probs) for a dataset matches the accuracy, it is called **well-calibrated**. Old models like LeNet (1998) was well-calibrated, but modern networks like ResNet (2016) are no longer well-calibrated. This paper explains what caused this and compares various calibration...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.04599#elbaro
http://www.shortscience.org/paper?bibtexKey=journals/corr/1706.04599#elbaroMon, 10 Dec 2018 05:52:45 +00001811.04551journals/corr/1811.045513Learning Latent Dynamics for Planning from Pixelswassname**Summary**: This paper presents three tricks that make model-based reinforcement more reliable when tested in tasks that require walking and balancing. The tricks are 1) are planning based on features, 2) using a recursive network that mixes probabilistic and deterministic information, and 3) looking forward multiple steps.
**Longer summary**
Imagine playing pool, armed with a tablet that can predict exactly where the ball will bounce, and the next bounce, and so on. That would be a huge adva...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.04551#wassname
http://www.shortscience.org/paper?bibtexKey=journals/corr/1811.04551#wassnameSun, 09 Dec 2018 11:50:05 +00001807.03146journals/corr/1807.031462Discovery of Latent 3D Keypoints via End-to-end Geometric ReasoningKrishna MurthyWhat the paper is about:
KeypointNet learns the optimal set of 3D keypoints and their 2D detectors for a specified downstream task. The authors demonstrate this by extracting 3D keypoints and their 2D detectors for the task of relative pose estimation across views. They show that, using keypoints extracted by KeypointNet, relative pose estimates are superior to ones that are obtained from a supervised set of keypoints.
Approach:
Training samples for KeypointNet comprise two views (images) of a...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.03146#krishnamurthy
http://www.shortscience.org/paper?bibtexKey=journals/corr/1807.03146#krishnamurthyThu, 06 Dec 2018 08:04:18 +0000conf/nips/GomezRUG173The Reversible Residual Network: Backpropagation Without Storing Activations.ameroyerResidual Networks (ResNets) have greatly advanced the state-of-the-art in Deep Learning by making it possible to train much deeper networks via the addition of skip connections. However, in order to compute gradients during the backpropagation pass, all the units' activations have to be stored during the feed-forward pass, leading to high memory requirements for these very deep networks.
Instead, the authors propose a **reversible architecture** based on ResNets, in which activations at one l...
http://www.shortscience.org/paper?bibtexKey=conf/nips/GomezRUG17#ameroyer
http://www.shortscience.org/paper?bibtexKey=conf/nips/GomezRUG17#ameroyerWed, 05 Dec 2018 15:14:10 +00001712.09913journals/corr/1712.099133Visualizing the Loss Landscape of Neural Netsdaisukelab- Presents a simple visualization method based on “filter normalization.”
- Observed that __the deeper networks become, neural loss landscapes become more chaotic__; causes a dramatic drop in generalization error, and ultimately to a lack of trainability.
- Observed that __skip connections promote flat minimizers and prevent the transition to chaotic behavior__; helps explain why skip connections are necessary for training extremely deep networks.
- Quantitatively measures non-convexity.
- S...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.09913#niz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1712.09913#nizWed, 05 Dec 2018 13:58:02 +00001703.06189journals/corr/1703.061892TURN TAP: Temporal Unit Regression Network for Temporal Action Proposalsshiyu## Temporal unit regression network
keyword: temporal action proposal; computing efficiency
**Summary**: In this paper, Jiyang et al designed a proposal generation and refinement network with high computation efficiency by reusing unit feature on coordinated regression and classification network. Especially, a new metric against temporal proposal called AR-F is raised to meet 2 metric criteria: 1. evaluate different method on the same dataset efficiently. 2. capable to evaluate same method'...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.06189#daisy
http://www.shortscience.org/paper?bibtexKey=journals/corr/1703.06189#daisyWed, 05 Dec 2018 12:03:51 +0000