ShortScience.org Latest SummariesShortScience.org Latest Summaries
http://www.shortscience.org/
60Sun, 18 Apr 2021 18:31:02 +0000journals/prl/BailoRJPBK182Efficient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint distributionOleksandr BailoKeypoint detection is an important step in various tasks such as SLAM, panorama stitching, camera calibration, and more. Efficient keypoint detectors, FAST (Features from Accelerated and Segments Test) for example, would detect keypoints where a relatively high brightness change is observed in relation to surrounding pixels. Most probably, the keypoints would be located on edges, as shown below:
Let's consider another image shown below. Here, while the detector is capable of detecting many keyp...
http://www.shortscience.org/paper?bibtexKey=journals/prl/BailoRJPBK18#ukrdailo
http://www.shortscience.org/paper?bibtexKey=journals/prl/BailoRJPBK18#ukrdailoSun, 07 Feb 2021 10:58:53 +000010.1038/s41586-019-1923-72Improved protein structure prediction using potentials from deep learningCodyWildIn January of this year (2020), DeepMind released a model called AlphaFold, which uses convolutional networks atop sequence-based and evolutionary features to predict protein folding structure. In particular, their model was designed to predict a distribution for how far away each pair of amino acids will be from one another in the final folded structure. Given such a trained model, you can score a candidate structure according to how likely it is under the model, and - if your process for gener...
http://www.shortscience.org/paper?bibtexKey=10.1038/s41586-019-1923-7#decodyng
http://www.shortscience.org/paper?bibtexKey=10.1038/s41586-019-1923-7#decodyngTue, 01 Dec 2020 02:28:52 +00002007.12223journals/corr/abs-2007-122233The Lottery Ticket Hypothesis for Pre-trained BERT NetworksCodyWildThis is an interesting paper, investigating (with a team that includes the original authors of the Lottery Ticket paper) whether the initializations that result from BERT pretraining have Lottery Ticket-esque properties with respect to their role as initializations for downstream transfer tasks.
As background context, the Lottery Ticket Hypothesis came out of an observation that trained networks could be pruned to remove low-magnitude weights (according to a particular iterative pruning strate...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2007-12223#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2007-12223#decodyngMon, 30 Nov 2020 01:54:47 +00001905.10295journals/corr/abs-1905-102952Learning to learn via Self-CritiqueMikhail Meskhi### Key points
- Instead of just focusing on supervised learning, a self-critique and adapt network provides a unsupervised learning approach in improving the overall generalization. It does this via transductive learning by learning a label-free loss function from the validation set to improve the base model.
- The SCA framework helps a learning algorithm be more robust by learning more relevant features and improve during the training phase.
### Ideas
1. Combine deep learning models with SC...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-10295#michaelmmeskhi
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-10295#michaelmmeskhiSat, 28 Nov 2020 21:58:53 +00002006.07589journals/corr/abs-2006-075892Adversarial Self-Supervised Contrastive LearningCodyWildThis a nice, compact paper testing a straightforward idea: can we use the contrastive loss structure so widespread in unsupervised learning as a framework for generating and training against adversarial examples? In the context of the adversarial examples literature, adversarial training - or, training against examples that were adversarially generated so as to minimize the loss of the model you're training - is the primary strategy used to train robust models (robust here in the sense of not be...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-07589#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-07589#decodyngSat, 28 Nov 2020 21:00:26 +00002007.00224journals/corr/2007.002242Debiased Contrastive LearningCodyWildThe premise of contrastive loss is that we want to push together the representations of objects that are similar, and push dissimilar representations farther apart. However, in an unlabeled setting, we don't generally have class labels to tell which images (or objects in general) are supposed to be similar or dissimilar along the axes that matter to us, so we use the shortcut of defining some transformation on a given anchor frame that gets us a frame we're confident is related enough to that an...
http://www.shortscience.org/paper?bibtexKey=journals/corr/2007.00224#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/2007.00224#decodyngFri, 27 Nov 2020 21:00:39 +00002007.02835journals/corr/abs-2007-028353GROVER: Self-supervised Message Passing Transformer on Large-scale Molecular DataCodyWildLarge-scale transformers on unsupervised text data have been wildly successful in recent years; arguably, the most successful single idea in the last ~3 years of machine learning. Given that, it's understandable that different domains within ML want to take their shot at seeing whether the same formula will work for them as well. This paper applies the principles of (1) transformers and (2) large-scale unlabeled data to the problem of learning informative embeddings of molecular graphs.
Labeli...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2007-02835#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2007-02835#decodyngThu, 26 Nov 2020 20:44:45 +00002004.02860journals/corr/abs-2004-028602Weakly-Supervised Reinforcement Learning for Controllable BehaviorCodyWildI tried my best, but I'm really confused by the central methodology of this paper. Here are the things I do understand:
1. The goal of the method is to learn disentangled representations, and, specifically, to learn representations that correspond to factors of variation in the environment that are selected by humans. That means, we ask humans whether a given image is higher or lower on a particular relevant axis, and aggregate those rankings into a vector, where a particular index of the vect...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2004-02860#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2004-02860#decodyngThu, 26 Nov 2020 04:48:23 +00002002.11328yang2020rethinking2Rethinking Bias-Variance Trade-off for Generalization of Neural NetworksCodyWildThis is a really cool paper that posits a relatively simple explanation for the strange phenomena known as double descent - both the fact of seeing it in the first place, and the difficulty in robustly causing it to appear. In the classical wisdom of statistics, increasing model complexity too far will lead to increase in variance, and thus an increase in test error (or "test risk" or "empirical risk"), leading to a U-shaped test error curve as a function of model complexity. Double descent is t...
http://www.shortscience.org/paper?bibtexKey=yang2020rethinking#decodyng
http://www.shortscience.org/paper?bibtexKey=yang2020rethinking#decodyngTue, 24 Nov 2020 05:26:23 +00002006.15134journals/corr/2006.151343Critic Regularized RegressionCodyWildOffline reinforcement learning is potentially high-value thing for the machine learning community learn to do well, because there are many applications where it'd be useful to generate a learnt policy for responding to a dynamic environment, but where it'd be too unsafe or expensive to learn in an on-policy or online way, where we continually evaluate our actions in the environment to test their value. In such settings, we'd like to be able to take a batch of existing data - collected from a hum...
http://www.shortscience.org/paper?bibtexKey=journals/corr/2006.15134#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/2006.15134#decodyngMon, 23 Nov 2020 05:52:49 +00002006.06936journals/corr/abs-2006-069364Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?CodyWildThis paper is ultimately relatively straightforward, for all that it's embedded in the somewhat new-to-me literature around graph-based Neural Architecture Search - the problem of iterating through options to find a graph representing an optimized architecture. The authors want to understand whether in this problem, as in many others in deep learning, we can benefit from building our supervised models off of representations learned during an unsupervised pretraining step. In this case, the unsup...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-06936#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-06936#decodyngSun, 22 Nov 2020 02:10:17 +00002006.12433journals/corr/2006.124333What shapes feature representations? Exploring datasets, architectures, and trainingCodyWildThis is a nice little empirical paper that does some investigation into which features get learned during the course of neural network training. To look at this, it uses a notion of "decodability", defined as the accuracy to which you can train a linear model to predict a given conceptual feature on top of the activations/learned features at a particular layer. This idea captures the amount of information about a conceptual feature that can be extracted from a given set of activations.
They wo...
http://www.shortscience.org/paper?bibtexKey=journals/corr/2006.12433#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/2006.12433#decodyngSat, 21 Nov 2020 04:57:58 +00002007.01293ren2020unlabeled3Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised LearningCodyWildThis paper argues that, in semi-supervised learning, it's suboptimal to use the same weight for all examples (as happens implicitly, when the unsupervised component of the loss for each example is just added together directly. Instead, it tries to learn weights for each specific data example, through a meta-learning-esque process.
The form of semi-supervised learning being discussed here is label-based consistency loss, where a labeled image is augmented and run through the current version of ...
http://www.shortscience.org/paper?bibtexKey=ren2020unlabeled#decodyng
http://www.shortscience.org/paper?bibtexKey=ren2020unlabeled#decodyngFri, 20 Nov 2020 04:05:54 +00002007.14062journals/corr/abs-2007-140623Big Bird: Transformers for Longer SequencesCodyWildTransformers - powered by self-attention mechanisms - have been a paradigm shift in NLP, and are now the standard choice for training large language models. However, while transformers do have many benefits in terms of computational constraints - most saliently, that attention between tokens can be computed in parallel, rather than needing to be evaluated sequentially like in a RNN - a major downside is their memory (and, secondarily, computational) requirements. The baseline form of self-attent...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2007-14062#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2007-14062#decodyngThu, 19 Nov 2020 02:32:44 +00002006.07710journals/corr/abs-2006-077103The Pitfalls of Simplicity Bias in Neural NetworksCodyWildThis is an interesting paper that makes a fairly radical claim, and I haven't fully decided whether what they find is an interesting-but-rare corner case, or a more fundamental weakness in the design of neural nets. The claim is: neural nets prefer learning simple features, even if there exist complex features that are equally or more predictive, and even if that means learning a classifier with a smaller margin - where margin means "the distance between the decision boundary and the nearest-by ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-07710#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-07710#decodyngSun, 15 Nov 2020 22:46:11 +00002010.11924journals/corr/abs-2010-119242In Search of Robust Measures of GeneralizationCodyWildGeneralization is, if not the central, then at least one of the central mysteries of deep learning. We are somehow able to able to train high-capacity, overparametrized models, that empirically have the capacity to fit to random data - meaning that they have the capacity to memorize the labeled data we give them - and which yet still manage to train functions that generalize to test data. People have tried to come up with generalization bounds - that is, bounds on the expected test error of a mo...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2010-11924#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2010-11924#decodyngSat, 14 Nov 2020 22:31:16 +00002006.06882journals/corr/abs-2006-068823Rethinking Pre-training and Self-trainingCodyWild Occasionally, I come across results in machine learning that I'm glad exist, even if I don't fully understand them, precisely because they remind me how little we know about the complicated information architectures we're building, and what kinds of signal they can productively use. This is one such result.
The paper tests a method called self-training, and compares it against the more common standard of pre-training. Pre-training works by first training your model on a different dataset, in ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-06882#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-06882#decodyngSat, 14 Nov 2020 05:00:22 +00002010.02302journals/corr/abs-2010-023022Latent World Models For Intrinsically Motivated ExplorationCodyWildThe thing I think is happening here:
It proposes a self-supervised learning scheme (which...seems fairly basic, but okay) to generate encodings. It then trains a Latent World Model, which takes in the current state encoding, the action, and the belief state (I think just the prior RNN state?) and predicts a next state. The intrinsic reward is the difference between this and the actual encoding of the next step. (This is dependent on a particular action and resulting next obs, it seems). I don'...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2010-02302#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2010-02302#decodyngThu, 12 Nov 2020 05:26:18 +00001911.09071journals/corr/abs-1911-090713Exploring the Origins and Prevalence of Texture Bias in Convolutional Neural NetworksCodyWildWhen humans classify images, we tend to use high-level information about the shape and position of the object. However, when convolutional neural networks classify images,, they tend to use low-level, or textural, information more than high-level shape information. This paper tries to understand what factors lead to higher shape bias or texture bias.
To investigate this, the authors look at three datasets with disagreeing shape and texture labels. The first is GST, or Geirhos Style Transfer. I...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1911-09071#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1911-09071#decodyngWed, 11 Nov 2020 07:08:22 +00002008.11687journals/corr/abs-2008-116873What is being transferred in transfer learning?CodyWildThis is an interesting - and refreshing - paper, in that, instead of trying to go all-in on a particular theoretical point, the authors instead run a battery of empirical investigations, all centered around the question of how to explain what happens to make transfer learning work. The experiments don't all line up to support a single point, but they do illustrate different interesting facets of the transfer process.
- An initial experiment tries to understand how much of the performance of fi...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2008-11687#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2008-11687#decodyngTue, 10 Nov 2020 06:58:27 +00002010.12050journals/corr/abs-2010-120503Contrastive Learning with Adversarial ExamplesCodyWildContrastive learning works by performing augmentations on a batch of images, and training a network to match the representations of the two augmented parts of a pair together, and push the representations of images not in a pair farther apart. Historically, these algorithms have benefitted from using stronger augmentations, which has the effect of making the two positive elements in a pair more visually distinct from one another. This paper tries to build on that success, and, beyond just using ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2010-12050#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2010-12050#decodyngMon, 09 Nov 2020 02:03:47 +00002004.11362journals/corr/2004.113623Supervised Contrastive LearningCodyWildThis was a really cool-to-me paper that asked whether contrastive losses, of the kind that have found widespread success in semi-supervised domains, can add value in a supervised setting as well. In a semi-supervised context, contrastive loss works by pushing together the representations of an "anchor" data example with an augmented version of itself (which is taken as a positive or target, because the image is understood to not be substantively changed by being augmented), and pushing the repre...
http://www.shortscience.org/paper?bibtexKey=journals/corr/2004.11362#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/2004.11362#decodyngSat, 07 Nov 2020 23:30:17 +00002006.10455journals/corr/abs-2006-104552What Do Neural Networks Learn When Trained With Random Labels?CodyWildThis is another paper that was a bit of a personal-growth test for me to try to parse, since it's definitely heavier on analytical theory than I'm used to, but I think I've been able to get something from it, even though I'll be the first to say I didn't understand it entirely.
The question of this paper is: why does it seem to be the case that training a neural network on a data distribution - but with your supervised labels randomly sampled - seems to afford some level of advantage when fine...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-10455#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-10455#decodyngSat, 07 Nov 2020 00:15:03 +00002007.13916journals/corr/abs-2007-139163Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset BiasesCodyWildIn the past year or so, contrastive learning has experienced widespread success, and has risen to be a dominant problem framing within self-supervised learning. The basic idea of contrastive learning is that, instead of needing human-generated labels to generate a supervised task, you instead assume that there exists some automated operation you can perform to a data element to generate another data element that, while different, should be considered still fundamentally the same, or at least mor...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2007-13916#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2007-13916#decodyngFri, 06 Nov 2020 04:39:42 +00002002.00632journals/corr/abs-2002-006323Effective Diversity in Population-Based Reinforcement LearningCodyWildA central problem in the domain of reinforcement learning is how to incentivize exploration and diversity of experience, since RL agents can typically only learn from states they go to, and it can often be the case that states with high reward don't have an obvious trail of high-reward states leading to them, meaning that algorithms that are naively optimizing for reward will be relatively unlikely to discover them. One potential way to promote exploration is to train an ensemble of agents, and ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2002-00632#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2002-00632#decodyngWed, 04 Nov 2020 00:44:40 +00002007.08794journals/corr/2007.087943Discovering Reinforcement Learning AlgorithmsCodyWildThis work attempts to use meta-learning to learn an update rule for a reinforcement learning agent. In this context, "learning an update rule" means learning the parameters of an LSTM module that takes in information about the agent's recent reward and current model and outputs two values - a scalar and a vector - that are used to update the agent's model. I'm not going to go too deep into meta-learning here, but, at a high level, meta learning methods optimize parameters governing an agent's le...
http://www.shortscience.org/paper?bibtexKey=journals/corr/2007.08794#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/2007.08794#decodyngTue, 03 Nov 2020 05:29:13 +00002006.04635journals/corr/abs-2006-046353Learning to Play No-Press Diplomacy with Best Response Policy IterationCodyWildThis paper focuses on an effort by a Deepmind team to train an agent that can play the game Diplomacy - a complex, multiplayer game where players play as countries controlling units, trying to take over the map of Europe. Some relevant factors of this game, for the purposes of this paper, are:
1) All players move at the same time, which means you need to model your opponent's current move, and play a move that succeeds in expectation over that predicted move distribution. This also means that,...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-04635#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2006-04635#decodyngMon, 02 Nov 2020 06:15:17 +000010.1101/2020.02.07.9388523Tumor Phylogeny Topology Inference via Deep LearningGavin GrayA very simple (but impractical) discrete model of subclonal evolution would include the following events:
* Division of a cell to create two cells:
* **Mutation** at a location in the genome of the new cells
* Cell death at a new timestep
* Cell survival at a new timestep
Because measurements of mutations are usually taken at one time point, this is taken to be at the end of a time series of these events, where a tiny of subset of cells are observed and a **genotype matrix** $A$ is produce...
http://www.shortscience.org/paper?bibtexKey=10.1101/2020.02.07.938852#gngdb
http://www.shortscience.org/paper?bibtexKey=10.1101/2020.02.07.938852#gngdbWed, 16 Sep 2020 15:59:52 +00001805.08296journals/corr/1805.082962Data-Efficient Hierarchical Reinforcement LearningFelipe Martins# Keypoints
- Proposes the HIerarchical Reinforcement learning with Off-policy correction (**HIRO**) algorithm.
- Does not require careful task-specific design.
- Generic goal representation to make it broadly applicable, without any manual design of goal spaces, primitives, or controllable dimensions.
- Use of off-policy experience using a novel off-policy correction.
- A two-level hierarchy architecture
- A higher-level controller outputs a goal for the lower-level controller every **c** ti...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.08296#felipemartins
http://www.shortscience.org/paper?bibtexKey=journals/corr/1805.08296#felipemartinsTue, 01 Sep 2020 00:38:54 +000010.1109/isbi45749.2020.90986862Bayesian Skip-Autoencoders for Unsupervised Hyperintense Anomaly Detection in High Resolution Brain MriFriedrich-Maximilian WeberlingThe reconstruction of high-fidelity resolution brain MR images is especially challenging because of the highly complex brain structure. Most promising approaches for this task are autoencoders and generative models such as Variational Autoencoders (VAE) or Generative Adversarial Networks (GAN). In Unsupervised Anomaly Detection (UAD), these architectures are only trained with images of healthy brain anatomy and not with images containing anomalies such as lesions. Therefore, processing an anomal...
http://www.shortscience.org/paper?bibtexKey=10.1109/isbi45749.2020.9098686#fweberling1995
http://www.shortscience.org/paper?bibtexKey=10.1109/isbi45749.2020.9098686#fweberling1995Mon, 31 Aug 2020 09:18:08 +00001809.01999journals/corr/1809.019992Recurrent World Models Facilitate Policy EvolutionPaul Barde## General Framework
The take-home message is that the challenge of Reinforcement Learning for environments with high-dimensional and partial observations is learning a good representation of the environment. This means learning a sensory features extractor V to deal with the highly dimensional observation (pixels for example). But also learning a temporal representation M of the environment dynamics to deal with the partial observability. If provided with such representations, learning a contr...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.01999#muntermulehitch
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.01999#muntermulehitchMon, 27 Jul 2020 13:05:14 +00001907.03976journals/corr/1907.039763Better-than-Demonstrator Imitation Learning via Automatically-Ranked DemonstrationsPaul Barde## General Framework
Extends T-REX (see [summary]()) so that preferences (rankings) over demonstrations are generated automatically (back to the common IL/IRL setting where we only have access to a set of unlabeled demonstrations). Also derives some theoretical requirements and guarantees for better-than-demonstrator performance.
## Motivations
* Preferences over demonstrations may be difficult to obtain in practice.
* There is no theoretical understanding of the requirements that lead to out...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1907.03976#muntermulehitch
http://www.shortscience.org/paper?bibtexKey=journals/corr/1907.03976#muntermulehitchMon, 27 Jul 2020 02:22:27 +00001904.06387journals/corr/1904.063872Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from ObservationsPaul Barde## General Framework
Only access to a finite set of **ranked demonstrations**. The demonstrations only contains **observations** and **do not need to be optimal** but must be (approximately) ranked from worst to best.
The **reward learning part is off-line** but not the policy learning part (requires interactions with the environment).
In a nutshell: learns a reward models that looks at observations. The reward model is trained to predict if a demonstration's ranking is greater than another on...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1904.06387#muntermulehitch
http://www.shortscience.org/paper?bibtexKey=journals/corr/1904.06387#muntermulehitchMon, 27 Jul 2020 02:18:47 +000010.15607/rss.2016.xii.0292Planning for Autonomous Cars that Leverage Effects on Human ActionsPaul Barde## General Framework
*wording: car = the autonomous car, driver = the other car it is interacting with*
Builds a model of an **autonomous car's influence over the behavior of an interacting driver** (human or simulated) that the autonomous car can leverage to plan more efficiently. The driver is modeled by the policy that maximizes his defined objective. In brief, a **linear reward function is learned off-line with IRL on human demonstrations** and the modeled policy takes the actions that max...
http://www.shortscience.org/paper?bibtexKey=10.15607/rss.2016.xii.029#muntermulehitch
http://www.shortscience.org/paper?bibtexKey=10.15607/rss.2016.xii.029#muntermulehitchMon, 27 Jul 2020 02:14:17 +00001406.5979journals/corr/1406.59792Reinforcement and Imitation Learning via Interactive No-Regret LearningPaul Barde## General Framework
Really **similar to DAgger** (see [summary]()) but considers **cost-sensitive classification** ("some mistakes are worst than others": you should be more careful in imitating that particular action of the expert if failing in doing so incurs a large cost-to-go). By doing so they improve from DAgger's bound of $\epsilon_{class}uT$ where $u$ is the difference in cost-to-go (between the expert and one error followed by expert policy) to $\epsilon_{class}T$ where $\epsilon_{cla...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1406.5979#muntermulehitch
http://www.shortscience.org/paper?bibtexKey=journals/corr/1406.5979#muntermulehitchMon, 27 Jul 2020 02:08:30 +00001011.0686journals/corr/1011.06862A Reduction of Imitation Learning and Structured Prediction to No-Regret Online LearningPaul Barde## General Framework
The imitation learning problem is here cast into a classification problem: label the state with the corresponding expert action. With this, you can see structured prediction (predict next label knowing your previous prediction) as a degenerated IL problem. They make the **reduction assumption** that you can make the probability of mistake $\epsilon$ as small as desired on the **training distribution** (expert or mixture). They also assume that the difference in the cost-to-g...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1011.0686#muntermulehitch
http://www.shortscience.org/paper?bibtexKey=journals/corr/1011.0686#muntermulehitchMon, 27 Jul 2020 01:53:35 +00001611.03530journals/corr/1611.035302Understanding deep learning requires rethinking generalizationANIRUDH NJ## Summary
The broad goal of this paper is to understand how a neural network learns the underlying distribution of the input data and the properties of the network that describes its generalization power.
Previous literature tries to use statistical measures like Rademacher complexity, uniform stability and VC dimension to explain the generalization error of the model. These methods explain generalization in terms of the number of parameters in the model along with the applied regularizat...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.03530#anirudhnj
http://www.shortscience.org/paper?bibtexKey=journals/corr/1611.03530#anirudhnjFri, 26 Jun 2020 15:33:03 +0000journals/af/Maymin112Markets are efficient if and only if P = NPquaxtonIs the market efficient? This is perhaps the most prevalent question in all of finance. While this paper does not aim to answer that question, it does frame it in an information-theoretic context. Mainly, Maymin shows that at least the weak form of the efficient market hypothesis (EMH) holds if and only if P = NP.
First, he defines what efficient market means:
"The weakest form of the EMH states that future prices cannot be predicted by analyzing prices from the past. Therefore, technical ana...
http://www.shortscience.org/paper?bibtexKey=journals/af/Maymin11#jyang772
http://www.shortscience.org/paper?bibtexKey=journals/af/Maymin11#jyang772Thu, 04 Jun 2020 02:53:53 +0000conf/iclr/RendaFC203Comparing Rewinding and Fine-tuning in Neural Network PruningCodyWildThis is an interestingly pragmatic paper that makes a super simple observation. Often, we may want a usable network with fewer parameters, to make our network more easily usable on small devices. It's been observed (by these same authors, in fact), that pruned networks can achieve comparable weights to their fully trained counterparts if you rewind and retrain from early in the training process, to compensate for the loss of the (not ultimately important) pruned weights. This observation has bee...
http://www.shortscience.org/paper?bibtexKey=conf/iclr/RendaFC20#decodyng
http://www.shortscience.org/paper?bibtexKey=conf/iclr/RendaFC20#decodyngFri, 15 May 2020 03:18:21 +00002004.13649journals/corr/2004.136492Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from PixelsCodyWildOne of the most notable flaws of modern model-free reinforcement learning is its sample inefficiency; where humans can learn a new task with relatively few examples, model that learn policies or value functions directly from raw data need huge amounts of data to train properly. Because the model isn't given any semantic features, it has to learn a meaningful representation from raw pixels using only the (often sparse, often noisy) signal of reward. Some past approaches have tried learning repres...
http://www.shortscience.org/paper?bibtexKey=journals/corr/2004.13649#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/2004.13649#decodyngSun, 10 May 2020 05:46:18 +00001903.11981journals/corr/abs-1903-119813Regularizing Trajectory Optimization with Denoising AutoencodersRobert MüllerThe typical model based reinforcement learning (RL) loop consists of collecting data, training a model of the environment, using the model to do model predictive control (MPC). If however the model is wrong, for example for state-action pairs that have been barely visited, the dynamics model might be very wrong and the MPC fails as the imagined model and the reality align to longer. Boney et a. propose to tackle this with a denoising autoencoder for trajectory regularization according to the fam...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-11981#robertmueller
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-11981#robertmuellerThu, 07 May 2020 08:08:00 +00001912.05500journals/corr/abs-1912-055002What Can Learned Intrinsic Rewards Capture?CodyWildThis paper out of DeepMind is an interesting synthesis of ideas out of the research areas of meta learning and intrinsic rewards. The hope for intrinsic reward structures in reinforcement learning - things like uncertainty reduction or curiosity - is that they can incentivize behavior like information-gathering and exploration, which aren't incentivized by the explicit reward in the short run, but which can lead to higher total reward in the long run. So far, intrinsic rewards have mostly been ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1912-05500#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1912-05500#decodyngTue, 05 May 2020 06:22:03 +0000conf/icml/FinnAL172Model-Agnostic Meta-Learning for Fast Adaptation of Deep NetworksAndrea Walter Ruggerini## TL;DR
The paper presents a model-agnostic strategy to perform few-shot learning taking advantage of prior knowledge acquired during in multitask learning. Such prior knowledge derives from priors acquired about generalized model parameters (e.g. weights or hyperparameters) during the Model Agnostic Meta-Learning (MAML) algorithm. The strategy can be applied to any algorithm trained with gradient descent (not only neural networks) being more general and perhaps effective than transfer learnin...
http://www.shortscience.org/paper?bibtexKey=conf/icml/FinnAL17#andreaw
http://www.shortscience.org/paper?bibtexKey=conf/icml/FinnAL17#andreawSun, 03 May 2020 14:29:05 +00002001.04451journals/corr/2001.044512Reformer: The Efficient TransformerCodyWildThe Transformer architecture - which uses a structure entirely based on key-value attention mechanisms to process sequences such as text - has taken over the worlds of language modeling and NLP in the past three years. However, Transformers at the scale used for large language models have huge computational and memory requirements.
This is largely driven by the fact that information at every step in the sequence (or, in the so-far-generated sequence during generation) is used to inform the rep...
http://www.shortscience.org/paper?bibtexKey=journals/corr/2001.04451#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/2001.04451#decodyngSun, 03 May 2020 05:14:23 +00001909.11655journals/corr/abs-1909-116552Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical SpaceCodyWildI found this paper a bit difficult to fully understand. Its premise, as far as I can follow, is that we may want to use genetic algorithms (GA), where we make modifications to elements in a population, and keep elements around at a rate proportional to some set of their desirable properties. In particular we might want to use this approach for constructing molecules that have properties (or predicted properties) we want. However, a downside of GA is that its easy to end up in local minima, where...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1909-11655#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1909-11655#decodyngFri, 01 May 2020 05:38:46 +0000conf/nips/KumarFSTL193Stabilizing Off-Policy Q-Learning via Bootstrapping Error ReductionRobert MüllerKumar et al. propose an algorithm to learn in batch reinforcement learning (RL), a setting where an agent learns purely form a fixed batch of data, $B$, without any interactions with the environments. The data in the batch is collected according to a batch policy $\pi_b$. Whereas most previous methods (like BCQ) constrain the learned policy to stay close to the behavior policy, Kumar et al. propose bootstrapping error accumulation reduction (BEAR), which constrains the newly learned policy to pl...
http://www.shortscience.org/paper?bibtexKey=conf/nips/KumarFSTL19#robertmueller
http://www.shortscience.org/paper?bibtexKey=conf/nips/KumarFSTL19#robertmuellerThu, 30 Apr 2020 13:31:29 +000010.1101/2020.03.03.9721332AI-aided design of novel targeted covalent inhibitors against SARS-CoV-2CodyWildThis preprint is a bit rambling, and I don't know that I fully followed what it was doing, but here's my best guess:
- We think it's probably the case that SARS-COV2 (COVID19) uses a protease (enzyme involved in its reproduction) that isn't available and co-optable in the human body, and is also quite similar to the comparable protease protein in the original SARS virus. Therefore, it is hoped that we might be able to take inhibitors that bind to SARS, and modify them in small ways to make t...
http://www.shortscience.org/paper?bibtexKey=10.1101/2020.03.03.972133#decodyng
http://www.shortscience.org/paper?bibtexKey=10.1101/2020.03.03.972133#decodyngThu, 30 Apr 2020 04:36:33 +00002003.03123journals/corr/abs-2003-031232Directional Message Passing for Molecular GraphsCodyWildThis paper, presented this week at ICLR 2020, builds on existing applications of message-passing Graph Neural Networks (GNN) for molecular modeling (specifically: for predicting quantum properties of molecules), and extends them by introducing a way to represent angles between atoms, rather than just distances between them, as current methods are limited to.
The basic version of GNNs on molecule data works by creating features attached to atoms at each level (starting at level 0 with the eleme...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2003-03123#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-2003-03123#decodyngWed, 29 Apr 2020 03:42:52 +00001911.11361journals/corr/abs-1911-113613Behavior Regularized Offline Reinforcement LearningRobert MüllerWu et al. provide a framework (behavior regularized actor critic (BRAC)) which they use to empirically study the impact of different design choices in batch reinforcement learning (RL). Specific instantiations of the framework include BCQ, KL-Control and BEAR.
Pure off-policy rl describes the problem of learning a policy purely from a batch $B$ of one step transitions collected with a behavior policy $\pi_b$. The setting allows for no further interactions with the environment. This learning re...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1911-11361#robertmueller
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1911-11361#robertmuellerMon, 27 Apr 2020 13:02:23 +00001908.06760journals/corr/abs-1908-067602Self-Attention Based Molecule Representation for Predicting Drug-Target InteractionCodyWildIn the last three years, Transformers, or models based entirely on attention for aggregating information from across multiple places in a sequence, have taken over the world of NLP. In this paper, the authors propose using a Transformer to learn a molecular representation, and then building a model to predict drug/target interaction on top of that learned representation. A drug/target interaction model takes in two inputs - a protein involved in a disease pathway, and a (typically small) molecul...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1908-06760#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1908-06760#decodyngSun, 26 Apr 2020 06:39:30 +0000journals/iacr/BellareRRS092Format-Preserving EncryptionquaxtonFormat-preserving encryption is a deterministic encryption scheme that encrypts plaintext of some specified format into ciphertext of the same format. This has a lot of practical use cases such as storing SSN or credit card information, without having to change the underlying schematics of the database or application that stores the data. The protected data is in-differentiable from unprotected data, and still enables some analytics over it, such as with masking (ie, displaying last four digits ...
http://www.shortscience.org/paper?bibtexKey=journals/iacr/BellareRRS09#jyang772
http://www.shortscience.org/paper?bibtexKey=journals/iacr/BellareRRS09#jyang772Thu, 23 Apr 2020 22:05:16 +0000conf/ac/Rasmussen034Gaussian Processes in Machine LearningFriedrich-Maximilian WeberlingIn this tutorial paper, Carl E. Rasmussen gives an introduction to Gaussian Process Regression focusing on the definition, the hyperparameter learning and future research directions.
A Gaussian Process is completely defined by its mean function $m(\pmb{x})$ and its covariance function (kernel) $k(\pmb{x},\pmb{x}')$. The mean function $m(\pmb{x})$ corresponds to the mean vector $\pmb{\mu}$ of a Gaussian distribution whereas the covariance function $k(\pmb{x}, \pmb{x}')$ corresponds to the covari...
http://www.shortscience.org/paper?bibtexKey=conf/ac/Rasmussen03#fweberling1995
http://www.shortscience.org/paper?bibtexKey=conf/ac/Rasmussen03#fweberling1995Tue, 21 Apr 2020 20:05:41 +00001903.08254journals/corr/abs-1903-082543Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context VariablesRobert MüllerRakelly et al. propose a method to do off-policy meta reinforcement learning (rl). The method achieves a 20-100x improvement on sample efficiency compared to on-policy meta rl like MAML+TRPO.
The key difficulty for offline meta rl arises from the meta-learning assumption, that meta-training and meta-test time match. However during test time the policy has to explore and sees as such on-policy data which is in contrast to the off-policy data that should be used at meta-training. The key contrib...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-08254#robertmueller
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1903-08254#robertmuellerTue, 21 Apr 2020 08:39:21 +000010.1093/bioinformatics/bty5732Predicting protein–protein interactions through sequence-based deep learningCodyWildMost of the interesting mechanics within living things are mediated by interactions between proteins, making it important and useful to have good predictive models of whether proteins will interact with one another, for validating possible interaction graph structures.
Prior methods for this problem - which takes as its input sequence representations of two proteins, and outputs a probability of interaction - have pursued different ideas for how to combine information from the two proteins. On...
http://www.shortscience.org/paper?bibtexKey=10.1093/bioinformatics/bty573#decodyng
http://www.shortscience.org/paper?bibtexKey=10.1093/bioinformatics/bty573#decodyngTue, 21 Apr 2020 06:36:31 +00001906.05374journals/corr/1906.053743Meta-Learning via Learned LossRobert MüllerBechtle et al. propose meta learning via learned loss ($ML^3$) and derive and empirically evaluate the framework on classification, regression, model-based and model-free reinforcement learning tasks.
The problem is formalized as learning parameters $\Phi$ of a meta loss function $M_\phi$ that computes loss values $L_{learned} = M_{\Phi}(y, f_{\theta}(x))$. Following the outer-inner loop meta algorithm design the learned loss $L_{learned}$ is used to update the parameters of the learner in the...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1906.05374#robertmueller
http://www.shortscience.org/paper?bibtexKey=journals/corr/1906.05374#robertmuellerMon, 20 Apr 2020 16:28:20 +00001802.04364journals/corr/abs-1802-043642Junction Tree Variational Autoencoder for Molecular Graph GenerationCodyWildPrior to this paper, most methods that used machine learning to generate molecular blueprints did so using SMILES representations - a string format with characters representing different atoms and bond types. This preference came about because ML had existing methods for generating strings that could be built on for generating SMILES (a particular syntax of string). However, an arguably more accurate and fundamental way of representing molecules is as graphs (with atoms as nodes and bonds as edg...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-04364#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1802-04364#decodyngMon, 20 Apr 2020 04:48:28 +00001705.10843journals/corr/GuimaraesSFA172Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation ModelsCodyWildThis paper's proposed method, the cleverly named ORGAN, combines techniques from GANs and reinforcement learning to generate candidate molecular sequences that incentivize desirable properties while still remaining plausibly on-distribution.
Prior papers I've read on molecular generation have by and large used approaches based in maximum likelihood estimation (MLE) - where you construct some distribution over molecular representations, and maximize the probability of your true data under that ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/GuimaraesSFA17#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/GuimaraesSFA17#decodyngSat, 18 Apr 2020 04:57:12 +0000journals/jcheminf/OlivecronaBEC172Molecular de-novo design through deep reinforcement learningCodyWildOver the past few days, I've been reading about different generative neural networks being tried out for molecular generation. So far this has mostly focused on latent variable space models like autoencoders, but today I shifted attention to a different approach rooted in reinforcement learning. The goal of most of these methods is 1) to build a generative model that can sample plausible molecular structures, but more saliently 2) specifically generate molecules optimized to exhibit some propert...
http://www.shortscience.org/paper?bibtexKey=journals/jcheminf/OlivecronaBEC17#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/jcheminf/OlivecronaBEC17#decodyngFri, 17 Apr 2020 06:00:27 +00001908.09791journals/corr/abs-1908-097912Once for All: Train One Network and Specialize it for Efficient Deploymentameroyer**Summary**: The goal of this work is to propose a "Once-for-all” (OFA) network: a large network which is trained such that its subnetworks (subsets of the network with smaller width, convolutional kernel sizes, shallower units) are also trained towards the target task. This allows to adapt the architecture to a given budget at inference time while preserving performance.
**Elastic Parameters.**
The goal is to train a large architecture that contains several well-trained subnetworks with dif...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1908-09791#ameroyer
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1908-09791#ameroyerThu, 16 Apr 2020 17:48:55 +00001610.02415journals/corr/Gomez-Bombarelli163Automatic chemical design using a data-driven continuous representation of moleculesCodyWildI'll admit that I found this paper a bit of a letdown to read, relative to expectations rooted in its high citation count, and my general excitement and interest to see how deep learning could be brought to bear on molecular design. But before a critique, let's first walk through the mechanics of how the authors' approach works.
The method proposed is basically a very straightforward Variational Auto Encoder, or VAE. It takes in a textual SMILES string representation of a molecular structure,...
http://www.shortscience.org/paper?bibtexKey=journals/corr/Gomez-Bombarelli16#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/Gomez-Bombarelli16#decodyngWed, 15 Apr 2020 03:11:44 +0000journals/iacr/BrakerskiV112Efficient Fully Homomorphic Encryption from (Standard) LWEquaxtonBrakerski and Vaikuntanathan introduce a fully homomorphic encryption scheme (FHE) based solely on the decisional learning with errors (LWE) security assumptions. Moving away from the relatively obscure mathematics of ideal lattices. They introduce relinearization and modulus switching techniques for dimensionality reduction and for removing the “squashing” step of Craig Gentry’s FHE scheme. BV11 and other similar schemes are commonly referred to as “Second generation FHE” schemes.
R...
http://www.shortscience.org/paper?bibtexKey=journals/iacr/BrakerskiV11#jyang772
http://www.shortscience.org/paper?bibtexKey=journals/iacr/BrakerskiV11#jyang772Mon, 13 Apr 2020 02:16:23 +00001704.01212journals/corr/GilmerSRVD174Neural Message Passing for Quantum ChemistryCodyWildIn the years before this paper came out in 2017, a number of different graph convolution architectures - which use weight-sharing and order-invariant operations to create representations at nodes in a graph that are contextualized by information in the rest of the graph - had been suggested for learning representations of molecules. The authors of this paper out of Google sought to pull all of these proposed models into a single conceptual framework, for the sake of better comparing and testing ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/GilmerSRVD17#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/GilmerSRVD17#decodyngFri, 10 Apr 2020 06:05:16 +00001708.09259journals/corr/1708.092592Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNethanoch kremerScatterNets incorporates geometric knowledge of images to produce discriminative and invariant (translation and rotation) features i.e. edge information. The same outcome as CNN's first layers hold. So why not replace that first layer/s with an equivalent, fixed, structure and let the optimizer find the best weights for the CNN with its leading-edge removed.
The main motivations of the idea of replacing the first convolutional, ReLU and pooling layers of the CNN with a two-layer parametric log-b...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.09259#hanochkremer
http://www.shortscience.org/paper?bibtexKey=journals/corr/1708.09259#hanochkremerThu, 09 Apr 2020 12:05:38 +000010.1111/j.1467-9965.1991.tb00002.x3Universal PortfoliosquaxtonCover's Universal Portfolio is an information-theoretic portfolio optimization algorithm that utilizes constantly rebalanced porfolios (CRP). A CRP is one in which the distribution of wealth among stocks in the portfolio remains the same from period to period. Universal Portfolio strictly performs rebalancing based on historical pricing, making no assumptions about the underlying distribution of the prices.
The wealth achieved by a CRP over n periods is:
$S_n(b,x^n) = \displaystyle \prod_{n}...
http://www.shortscience.org/paper?bibtexKey=10.1111/j.1467-9965.1991.tb00002.x#jyang772
http://www.shortscience.org/paper?bibtexKey=10.1111/j.1467-9965.1991.tb00002.x#jyang772Wed, 08 Apr 2020 23:17:22 +00001611.03199journals/corr/Altae-TranRPP162Low Data Drug Discovery with One-shot LearningCodyWildThe goal of one-shot learning tasks is to design a learning structure that can perform a new task (or, more canonically, add a new class to an existing task) using only one a small number of examples of the new task or class. So, as an example: you'd want to be able to take one positive and one negative example of a given task and correctly classify subsequent points as either positive or negative. A common way of achieving this, and the way that the paper builds on, is to learn a parametrized f...
http://www.shortscience.org/paper?bibtexKey=journals/corr/Altae-TranRPP16#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/Altae-TranRPP16#decodyngWed, 08 Apr 2020 05:11:54 +00001703.00564journals/corr/WuRFGGPLP172MoleculeNet: A Benchmark for Molecular Machine LearningCodyWildThis is a paper released by the creators of the DeepChem library/framework, explaining the efforts they've put into facilitating straightforward and reproducible testing of new methods. They advocate for consistency between tests on three main axes.
1. On the most basic level, that methods evaluate on the same datasets
2. That they use canonical train/test splits
3. That they use canonical metrics.
To that end, they've integrated a framework they call "MoleculeNet" into DeepChem, containing ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/WuRFGGPLP17#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/WuRFGGPLP17#decodyngTue, 07 Apr 2020 04:15:48 +00001509.09292journals/corr/DuvenaudMAGHAA153Convolutional Networks on Graphs for Learning Molecular FingerprintsCodyWildIf you read modern (that is, 2018-2020) papers using deep learning on molecular inputs, almost all of them use some variant of graph convolution. So, I decided to go back through the citation chain and read the earliest papers that thought to apply this technique to molecules, to get an idea of lineage of the technique within this domain.
This 2015 paper, by Duvenaud et al, is the earliest one I can find. It focuses the entire paper on comparing differentiable, message-passing networks to the ...
http://www.shortscience.org/paper?bibtexKey=journals/corr/DuvenaudMAGHAA15#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/DuvenaudMAGHAA15#decodyngMon, 06 Apr 2020 16:05:21 +00001603.00856journals/corr/KearnesMBPR163Molecular Graph Convolutions: Moving Beyond FingerprintsCodyWildThis paper was published after the 2015 Duvenaud et al paper proposing a differentiable alternative to circular fingerprints of molecules: substituting out exact-match random hash functions to identify molecular structures with learned convolutional-esque kernels. As far as I can tell, the Duvenaud paper was the first to propose something we might today recognize as graph convolutions on atoms. I hoped this paper would build on that one, but it seems to be coming from a conceptually different di...
http://www.shortscience.org/paper?bibtexKey=journals/corr/KearnesMBPR16#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/KearnesMBPR16#decodyngMon, 06 Apr 2020 06:30:03 +00001608.04844journals/corr/1608.048442Boosting Docking-based Virtual Screening with Deep LearningCodyWildMy objective in reading this paper was to gain another perspective on, and thus a more well-grounded view of, machine learning scoring functions for docking-based prediction of ligand/protein binding affinity. As quick background context, these models are useful because many therapeutic compounds act by binding to a target protein, and it can be valuable to prioritize doing wet lab testing on compounds that are predicted to have a stronger binding affinity. Docking systems work by predicting the...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.04844#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1608.04844#decodyngSat, 04 Apr 2020 05:03:25 +00001910.02845journals/corr/1910.028453Combining docking pose rank and structure with deep learning improves protein-ligand binding mode predictionCodyWildThis paper focuses on the application of deep learning to the docking problem within rational drug design. The overall objective of drug design or discovery is to build predictive models of how well a candidate compound (or "ligand") will bind with a target protein, to help inform the decision of what compounds are promising enough to be worth testing in a wet lab. Protein binding prediction is important because many small-molecule drugs, which are designed to be small enough to get through cell...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1910.02845#decodyng
http://www.shortscience.org/paper?bibtexKey=journals/corr/1910.02845#decodyngFri, 03 Apr 2020 05:28:05 +00001910.01708journals/corr/1910.017083Benchmarking Batch Deep Reinforcement Learning AlgorithmsRobert MüllerThe authors propose a unified setting to evaluate the performance of batch reinforcement learning algorithms. The proposed benchmark is discrete and based on the popular Atari Domain. The authors review and benchmark several current batch RL algorithms against a newly introduced version of BCQ (Batch Constrained Deep Q Learning) for discrete environments.
Note in line 5 that the policy chooses actions with a restricted argmax operation, eliminating actions that have not enough support in the...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1910.01708#robertmueller
http://www.shortscience.org/paper?bibtexKey=journals/corr/1910.01708#robertmuellerFri, 27 Mar 2020 14:40:38 +0000conf/icml/FujimotoMP193Off-Policy Deep Reinforcement Learning without ExplorationRobert MüllerInteracting with the environment comes sometimes at a high cost, for example in high stake scenarios like health care or teaching. Thus instead of learning online, we might want to learn from a fixed buffer $B$ of transitions, which is filled in advance from a behavior policy.
The authors show that several so called off-policy algorithms, like DQN and DDPG fail dramatically in this pure off-policy setting.
They attribute this to the extrapolation error, which occurs in the update of a value es...
http://www.shortscience.org/paper?bibtexKey=conf/icml/FujimotoMP19#robertmueller
http://www.shortscience.org/paper?bibtexKey=conf/icml/FujimotoMP19#robertmuellerWed, 25 Mar 2020 10:07:55 +00002003.05856journals/corr/2003.058565Online Fast Adaptation and Knowledge Accumulation: a New Approach to Continual LearningMassimo Cacciadisclaimer: I'm the first author of the paper
## TL;DR
We have made a lot of progress on catastrophic forgetting within the standard evaluation protocol,
i.e. sequentially learning a stream of tasks and testing our models' capacity to remember them all.
We think it's time a new approach to Continual Learning (CL), coined OSAKA, which is more aligned with real-life applications of CL. It brings CL closer to Online Learning and Open-World Learning.
main modifications we propose:
- bring CL cl...
http://www.shortscience.org/paper?bibtexKey=journals/corr/2003.05856#mcaccia
http://www.shortscience.org/paper?bibtexKey=journals/corr/2003.05856#mcacciaThu, 19 Mar 2020 16:41:59 +00001905.12558journals/corr/1905.125583Limitations of the Empirical Fisher Approximation for Natural Gradient DescentRobert MüllerThe authors analyse in the very well written paper the relation between Fisher $F(\theta) = \sum_n \mathbb{E}_{p_{\theta}(y \vert x)}[\nabla_{\theta} \log(p_{\theta}(y \vert x_n))\nabla_{\theta} \log(p_{\theta}(y \vert x_n))^T] $ and empirical Fisher $\bar{F}(\theta) = \sum_n [\nabla_{\theta} \log(p_{\theta}(y_n \vert x_n))\nabla_{\theta} \log(p_{\theta}(y_n \vert x_n))^T] $, which has recently seen a surge in interest. . The definitions differ in that $y_n$ is a training label instead of a samp...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1905.12558#robertmueller
http://www.shortscience.org/paper?bibtexKey=journals/corr/1905.12558#robertmuellerThu, 19 Mar 2020 08:59:52 +0000conf/nips/BafnaMV183Thwarting Adversarial Examples: An L_0-Robust Sparse Fourier TransformDavid StutzBafna et al. show that iterative hard thresholding results in $L_0$ robust Fourier transforms. In particular, as shown in Algorithm 1, iterative hard thresholding assumes a signal $y = x + e$ where $x$ is assumed to be sparse, and $e$ is assumed to be sparse. This translates to noise $e$ that is bounded in its $L_0$ norm, corresponding to common adversarial attacks such as adversarial patches in computer vision. Using their algorithm, the authors can provably reconstruct the signal, specifically...
http://www.shortscience.org/paper?bibtexKey=conf/nips/BafnaMV18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/BafnaMV18#davidstutzSat, 14 Mar 2020 23:31:48 +00001809.08758journals/corr/1809.087582Low Frequency Adversarial PerturbationDavid StutzGuo et al. propose to augment black-box adversarial attacks with low-frequency noise to obtain low-frequency adversarial examples as shown in Figure 1. To this end, the boundary attack as well as the NES attack are modified to sample from a low-frequency Gaussian distribution instead from Gaussian noise directly. This is achieved through an inverse discrete cosine transform as detailed in the paper.
Figure 1: Example of a low-frequency adversarial example.
Also find this summary at [davidstut...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.08758#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1809.08758#davidstutzSat, 14 Mar 2020 23:27:21 +000010.1109/cvprw.2018.002123Semantic Adversarial ExamplesDavid StutzHosseini and Poovendran propose semantic adversarial examples by randomly manipulating hue and saturation of images. In particular, in an iterative algorithm, hue and saturation are randomly perturbed and projected back to their valid range. If this results in mis-classification the perturbed image is returned as the adversarial example and the algorithm is finished; if not, another iteration is run. The result is shown in Figure 1. As can be seen, the structure of the images is retained while h...
http://www.shortscience.org/paper?bibtexKey=10.1109/cvprw.2018.00212#davidstutz
http://www.shortscience.org/paper?bibtexKey=10.1109/cvprw.2018.00212#davidstutzSat, 14 Mar 2020 23:17:20 +0000conf/icml/KarmonZG182LaVAN: Localized and Visible Adversarial NoiseDavid StutzKarmon et al. propose a gradient-descent based method for obtaining adversarial patch like localized adversarial examples. In particular, after selecting a region of the image to be modified, several iterations of gradient descent are run in order to maximize the probability of the target class and simultaneously minimize the probability in the true class. After each iteration, the perturbation is masked to the patch and projected onto the valid range of [0,1] for images. On ImageNet, the author...
http://www.shortscience.org/paper?bibtexKey=conf/icml/KarmonZG18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/icml/KarmonZG18#davidstutzSat, 14 Mar 2020 23:13:00 +00001904.00759journals/corr/abs-1904-007592Adversarial camera stickers: A physical camera-based attack on deep learning systemsDavid StutzLi et al. propose camera stickers that when computed adversarially and physically attached to the camera leads to mis-classification. As illustrated in Figure 1, these stickers are realized using circular patches of uniform color. These individual circular stickers are computed in a gradient-descent fashion by optimizing their location, color and radius. The influence of the camera on these stickers is modeled realistically in order to guarantee success.
Figure 1: Illustration of adversarial s...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-00759#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1904-00759#davidstutzSat, 14 Mar 2020 22:54:51 +000010.1109/wacv.2019.001432Local Gradients Smoothing: Defense Against Localized Adversarial AttacksDavid StutzNaseer et al. propose to smooth local gradients as defense against adversarial patches. In particular, as illustrated in Figure 1, the local image gradient is computed through convolution. Then, in local, overlapping windows, the gradients are set to zero if the total sum of absolute gradient values exceeds a specific threshold. The remaining gradient map is supposed to indicate regions where it is likely that adversarial patches can be found. Using this gradient map, the image is smoothed, i.e....
http://www.shortscience.org/paper?bibtexKey=10.1109/wacv.2019.00143#davidstutz
http://www.shortscience.org/paper?bibtexKey=10.1109/wacv.2019.00143#davidstutzSat, 14 Mar 2020 22:51:20 +0000conf/raid/ZuoYL0192Exploiting the Inherent Limitation of L0 Adversarial ExamplesDavid StutzZuo et al. propose a two-stage system for detecting $L_0$ adversarial examples. Their system is based on the following two observations: (a) $L_0$ adversarial examples often result in very drastic changes of individual pixels and (b) these pixels are usually isolated and scattered over the image. Thus, they propose to train a siamese network to detect adversarial examples. To this end, they use a pre-processor and train the network to detect adversarial examples by taking the input and the pre-p...
http://www.shortscience.org/paper?bibtexKey=conf/raid/ZuoYL019#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/raid/ZuoYL019#davidstutzSat, 14 Mar 2020 22:48:50 +0000conf/iclr/LeeAJ193Towards Robust, Locally Linear Deep NetworksDavid StutzLee et al. propose a regularizer to increase the size of linear regions of rectified deep networks around training and test points. Specifically, they assume piece-wise linear networks, in its most simplistic form consisting of linear layers (fully connected layers, convolutional layers) and ReLU activation functions. In these networks, linear regions are determined by activation patterns, i.e., a pattern indicating which neurons have value greater than zero. Then, the goal is to compute, and la...
http://www.shortscience.org/paper?bibtexKey=conf/iclr/LeeAJ19#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/iclr/LeeAJ19#davidstutzFri, 13 Mar 2020 22:25:08 +0000conf/aaai/LiuYLSCL192DPATCH: An Adversarial Patch Attack on Object DetectorsDavid StutzLiu et al. propose DPatch, adversarial patches against state-of-the-art object detectors. Similar to existing adversarial patches, where a patch with fixed pixels is placed in an image in order to evade (or change) classification, the authors compute their DPatch using an optimization procedure. During optimization, the patch to be optimized is placed in random locations on all images of, e.g. on PASCAL VOC 2007, and the pixels are updated in order to maximize the loss of the classifier (either ...
http://www.shortscience.org/paper?bibtexKey=conf/aaai/LiuYLSCL19#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/aaai/LiuYLSCL19#davidstutzFri, 13 Mar 2020 22:16:25 +0000conf/nips/SalmanLRZZBY192Provably Robust Deep Learning via Adversarially Trained Smoothed ClassifiersDavid StutzSalman et al. combined randomized smoothing with adversarial training based on an attack specifically designed against smoothed classifiers. Specifically, they consider the formulation of randomized smoothing by Cohen et al. [1]; here, Gaussian noise around the input (adversarial or clean) is sampled and the classifier takes a simple majority vote. In [1], Cohen et al. show that this results in good bounds on robustness. In this paper, Salman et al. propose an adaptive attack against randomized ...
http://www.shortscience.org/paper?bibtexKey=conf/nips/SalmanLRZZBY19#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/SalmanLRZZBY19#davidstutzFri, 13 Mar 2020 22:07:15 +0000conf/ccs/LambVKB192Interpolated Adversarial Training: Achieving Robust Neural Networks Without Sacrificing Too Much AccuracyDavid StutzLamb et al. propose interpolated adversarial training to increase robustness against adversarial examples. Particularly, a $50\%/50\%$ variant of adversarial training is used, i.e., in each iteration the batch consists of $50\%$ clean and $50\%$ adversarial examples. The loss is then computed on these both parts, encouraging the network to predict the correct labels on the adversarial examples, and averaged afterwards. In interpolated adversarial training, the loss is adapted according to the Mi...
http://www.shortscience.org/paper?bibtexKey=conf/ccs/LambVKB19#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/ccs/LambVKB19#davidstutzFri, 13 Mar 2020 21:59:51 +0000conf/nips/Bartlett962For Valid Generalization the Size of the Weights is More Important than the Size of the NetworkDavid StutzBarlett shows that lower generalization bounds for multi-layer perceptrons with limited sizes of the weights can be found using the so-called fat-shattering dimension. Similar to the classical VC dimensions, the fat shattering dimensions quantifies the expressiveness of hypothesis classes in machine learning. Specifically, considering a sequence of points $x_1, \ldots, x_d$, a hypothesis class $H$ is said to shatter this sequence if, for any label assignment $b_1, \ldots, b_d \in \{-1,1\}$, a fu...
http://www.shortscience.org/paper?bibtexKey=conf/nips/Bartlett96#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/Bartlett96#davidstutzFri, 13 Mar 2020 21:55:39 +00001905.03837duesterwald2019exploring3Exploring the Hyperparameter Landscape of Adversarial RobustnessDavid StutzDuesterwald et al. study the influence of hyperparameters on adversarial training and its robustness as well as accuracy. As shown in Figure 1, the chosen parameters, the ratio of adversarial examples per batch and the allowed perturbation $\epsilon$, allow to control the trade-off between adversarial robustness and accuracy. Even for larger $\epsilon$, at least on MNIST and SVHN, using only few adversarial examples per batch increases robustness significantly while only incurring a small loss i...
http://www.shortscience.org/paper?bibtexKey=duesterwald2019exploring#davidstutz
http://www.shortscience.org/paper?bibtexKey=duesterwald2019exploring#davidstutzThu, 12 Mar 2020 22:07:26 +00001901.09878journals/corr/abs-1901-098782CapsAttacks: Robust and Imperceptible Adversarial Attacks on Capsule NetworksDavid StutzMarchisio et al. propose a black-box adversarial attack on Capsule Networks. The main idea of the attack is to select pixels based on their local standard deviation. Given a window of allowed pixels to be manipulated, these are sorted based on standard deviation and possible impact on the predicted probability (i.e., gap between target class probability and maximum other class probability). A subset of these pixels is then manipulated by a fixed noise value $\delta$. In experiments, the attack i...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1901-09878#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1901-09878#davidstutzThu, 12 Mar 2020 22:00:51 +00001704.03453journals/corr/TramerPGBM172The Space of Transferable Adversarial ExamplesDavid StutzTramer et al. study adversarial subspaces, subspaces of the input space that are spanned by multiple, orthogonal adversarial examples. This is achieved by iteratively searching for orthogonal adversarial examples, relative to a specific test example. This can, for example, be done using classical second- or first-order optimization methods for finding adversarial examples with the additional constraint of finding orthogonal adversarial examples. However, the authors also consider different attac...
http://www.shortscience.org/paper?bibtexKey=journals/corr/TramerPGBM17#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/TramerPGBM17#davidstutzThu, 12 Mar 2020 21:50:49 +00001906.05419journals/corr/abs-1906-054192Efficient Evaluation-Time Uncertainty Estimation by Improved DistillationDavid StutzEnglesson and Azizpour propose an adapted knowledge distillation version to improve confidence calibration on out-of-distribution examples including adversarial examples. In contrast to vanilla distillation, they make the following changes: First, high capacity student networks are used, for example, by increasing depth or with. Then, the target distribution is “sharpened” using the true label by reducing the distributions overall entropy. Finally, for wrong predictions of the teacher model,...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-05419#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-05419#davidstutzMon, 09 Mar 2020 21:59:38 +0000conf/iclr/HendrycksD192Benchmarking Neural Network Robustness to Common Corruptions and PerturbationsDavid StutzHendrycks and Dietterich propose ImageNet-C and ImageNet-P benchmarks for corruption and perturbation robustness evaluation. Both datasets come in various sizes, and corruptions always come in different difficulties. The used corruptions include many common, realistic noise types such as various types of blur and random noise, brightness changes and compression artifacts. ImageNet-P differs from ImageNet-C in that sequences of perturbations are generated. This means, for a specific perturbation ...
http://www.shortscience.org/paper?bibtexKey=conf/iclr/HendrycksD19#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/iclr/HendrycksD19#davidstutzMon, 09 Mar 2020 21:57:45 +00001905.06455journals/corr/abs-1905-064552On Norm-Agnostic Robustness of Adversarial TrainingDavid StutzLi et al. evaluate adversarial training using both $L_2$ and $L_\infty$ attacks and proposes a second-order attack. The main motivation of the paper is to show that adversarial training cannot increase robustness against both $L_2$ and $L_\infty$ attacks. To this end, they propose a second-order adversarial attack and experimentally show that ensemble adversarial training can partly solve the problem.
Also find this summary at [davidstutz.de]().
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-06455#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1905-06455#davidstutzMon, 09 Mar 2020 21:41:28 +00001906.02611journals/corr/abs-1906-026112Improving Robustness Without Sacrificing Accuracy with Patch Gaussian AugmentationDavid StutzLopes et al. propose patch-based Gaussian data augmentation to improve accuracy and robustness against common corruptions. Their approach is intended to be an interpolation between Gaussian noise data augmentation and CutOut. During training, random patches on images are selected and random Gaussian noise is added to these patches. With increasing noise level (i.e., its standard deviation) this results in CutOut; with increasing patch size, this results in regular Gaussian noise data augmentatio...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02611#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02611#davidstutzMon, 09 Mar 2020 21:33:59 +00001906.02337journals/corr/abs-1906-023373MNIST-C: A Robustness Benchmark for Computer VisionDavid StutzMu and Gilmer introduce MNIST-C, an MNIST-based corruption benchmark for out-of-distribution evaluation. The benchmark includes various corruption types including random noise (shot and impulse noise), blur (glass and motion blur), (affine) transformations, “striping” or occluding parts of the image, using Canny images or simulating fog. These corruptions are also shown in Figure 1. The transformations have been chosen to be semantically invariant, meaning that the true class of the image do...
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02337#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/abs-1906-02337#davidstutzMon, 09 Mar 2020 21:27:36 +0000conf/icml/TeyeAS183Bayesian Uncertainty Estimation for Batch Normalized Deep NetworksDavid StutzTeye et al. show that neural networks with batch normalization can be used to give uncertainty estimates through Monte Carlo sampling. In particular, instead of using the test mode of batch normalization, where the statistics (mean and variance) of each batch normalization layer are fixed, these statistics are computed per batch, as in training mode. To this end, for a specific query image, random batches from the training set are sampled, and prediction uncertainty is estimated using Monte Carl...
http://www.shortscience.org/paper?bibtexKey=conf/icml/TeyeAS18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/icml/TeyeAS18#davidstutzMon, 09 Mar 2020 21:19:42 +00001607.06450journals/corr/1607.064502Layer NormalizationDavid StutzBa et al. propose layer normalization, normalizing the activations of a layer by its mean and standard deviation. In contrast to batch normalization, this scheme does not depend on the current batch; thus, it performs the same computation at training and test time. The general scheme, however, is very similar. Given the $l$-th layer of a multi-layer perceptron,
$a_i^l = (w_i^l)^T h^l$ and $h_i^{l + 1} = f(a_i^l + b_i^l)$
with $W^l$ being the weight matrix, the activations $a_i^l$ are normalize...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.06450#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.06450#davidstutzSun, 08 Mar 2020 19:20:46 +00001802.08760journals/corr/1802.087602Sensitivity and Generalization in Neural Networks: an Empirical StudyDavid StutzNovak et al. study the relationship between neural network sensitivity and generalization. Here, sensitivity is measured in terms of the Frobenius gradient of the network’s probabilities (resulting in a Jacobian matrix, not depending on the true label) or based on a coding scheme of activations. The latter is intended to quantify transitions between linear regions of the piece-wise linear model. To this end, all activations are assigned either $0$ or $1$ depending on their ReLU output. Based o...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.08760#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1802.08760#davidstutzSun, 08 Mar 2020 18:34:58 +00001607.08022journals/corr/1607.080222Instance Normalization: The Missing Ingredient for Fast StylizationDavid StutzIn the context of stylization, Ulyanov et al. propose to use instance normalization instead of batch normalization. In detail, instance normalization does not compute the mean and standard deviation used for normalization over the current mini-batch in training. Instead, these statistics are computed per instance individually. This also has the benefit of having the same training and test procedure, meaning that normalization is the same in both cases – in contrast to batch normalization.
Als...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.08022#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1607.08022#davidstutzSun, 08 Mar 2020 18:21:50 +00001803.08494journals/corr/1803.084942Group NormalizationDavid StutzWu and He propose group normalization as alternative to batch normalization. Instead of computing the statistics used for normalization based on the current mini-batch, group normalization computes these statistics per instance but in groups of channels (for convolutional layers). Specifically, given activations $x_i$ with $i = (i_N, i_C, i_H, i_W)$ indexing along batch size, channels, height and width, batch normalization computes
$\mu_i = \frac{1}{|S|}\sum_{k \in S} x_k$ and $\sigma_i = \sqrt...
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.08494#davidstutz
http://www.shortscience.org/paper?bibtexKey=journals/corr/1803.08494#davidstutzSun, 08 Mar 2020 18:10:53 +0000conf/nips/ZhangS182Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy LabelsDavid StutzZhang and Sabuncu propose a generalized cross entropy loss for robust learning on noisy labels. The approach is based on the work by Gosh et al. [1] showing that the mean absolute error can be robust to label noise. Specifically, they show that a symmetric loss, under specific assumptions on the label noise, is robust. Here, symmetry corresponds to
$\sum_{j=1}^c \mathcal{L}(f(x), j) = C$ for all $x$ and $f$
where $c$ is the number of classes and $C$ some constant. The cross entropy loss is not...
http://www.shortscience.org/paper?bibtexKey=conf/nips/ZhangS18#davidstutz
http://www.shortscience.org/paper?bibtexKey=conf/nips/ZhangS18#davidstutzSun, 08 Mar 2020 18:03:29 +0000