ShortScience.org Latest Summaries

ShortScience.org Latest Summaries ShortScience.org Latest Summaries https://shortscience.org 60 Thu, 25 Apr 2024 01:00:02 +0000 2110.11309 journals/corr/2110.11309 2 Fast Model Editing at Scale Joseph Paul Cohen The goal of this work is to edit the model’s weights given new edit pairs ($x_e, y_e$) at test time. They achieve this by learning a "model editor network" that takes a fine tuning gradient computed from ($x_e, y_e$) and transforms this into a weight update. $$ f(\nabla W_l) \rightarrow \tilde\nabla W_l$$ The editor network is parameterized by the layer that it is predicting using a FiLM style scale and shift. The editor network is trained on a small set of examples ($D^{tr}_{edit}$). The... https://shortscience.org/paper?bibtexKey=journals/corr/2110.11309#joecohen https://shortscience.org/paper?bibtexKey=journals/corr/2110.11309#joecohen Thu, 08 Feb 2024 00:02:46 +0000 2006.15055 journals/corr/2006.15055 2 Object-Centric Learning with Slot Attention ngthanhtinqn The Slot Attention module maps from a set of N input feature vectors to a set of K output vectors that we refer to as slots. Each vector in this output set can, for example, describe an object or an entity in the input. https://shortscience.org/paper?bibtexKey=journals/corr/2006.15055#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2006.15055#ngthanhtinqn Fri, 24 Mar 2023 17:52:29 +0000 2101.07042 journals/corr/2101.07042 2 CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition ngthanhtinqn This paper aims to do zero-shot action recognition which uses cluster-based representation. Concretely, it uses REINFORCE algorithm which is a Reinforcement Learning algorithm to optimize the centroids and the reward signal is the classification scores. https://shortscience.org/paper?bibtexKey=journals/corr/2101.07042#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2101.07042#ngthanhtinqn Fri, 24 Mar 2023 17:49:18 +0000 2302.14045 journals/corr/2302.14045 2 Language Is Not All You Need: Aligning Perception with Language Models ngthanhtinqn This paper is about Multimodal Large Language Model (MLLM). In this paper, they proposed an MLLM model called KOSMOS-1 that can do instruction following, VQA, IQ-testing, visual dialog, etc. The input of this model is image-caption pairs and interleaved data of images and texts. The input data will be fed into an embedding module to encode the data into vectors, then the vectors will be fed into a Transformer Decoder. Then the decoder will predict the next token based on the previous cont... https://shortscience.org/paper?bibtexKey=journals/corr/2302.14045#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2302.14045#ngthanhtinqn Wed, 01 Mar 2023 17:00:44 +0000 1807.00517 journals/corr/1807.00517 2 Women also Snowboard: Overcoming Bias in Captioning Models (Extended Abstract) ngthanhtinqn This paper is to reduce gender bias in the captioning model. Concretely, traditional captioning models tend to rely on contextual cues, so they usually predict incorrect captions for an image that contains people. To reduce gender bias, they introduced a new $Equalizer$ model that contains two losses: (1) Appearance Confusion Loss: When it is hard to tell if there is a man or a woman in the image, the model should provide a fair probability of predicting a man or a woman. To define that loss,... https://shortscience.org/paper?bibtexKey=journals/corr/1807.00517#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/1807.00517#ngthanhtinqn Tue, 28 Feb 2023 05:36:09 +0000 2203.16639 journals/corr/2203.16639 2 FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations ngthanhtinqn This paper proposed a method to locate an object based on an image and a sentence describing objects in the image. Then, predicting a new visual concept embedding based on two graphs (1) a graph that describes the relationship between objects in a supplemental sentence describing several objects, and (2) a graph that describes the relationship between the detected object in the image and example images related to objects in the supplementary sentence. This embedding can be used for many downstre... https://shortscience.org/paper?bibtexKey=journals/corr/2203.16639#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2203.16639#ngthanhtinqn Sun, 26 Feb 2023 20:17:14 +0000 2211.11158 journals/corr/2211.11158 2 Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification ngthanhtinqn what is the paper doing? This paper proposed a way to explain the model decision by human-readable concepts. For example, if the model thinks the following image is a black-throated sparrow, then a human can understand this decision via input descriptors. The descriptors were obtained from GPT-3, they got 500 descriptors for each class and then remove the class name in each descriptor. Then, for each class, they chose $k$ concepts to make sure that every class has an equal amount of concepts.... https://shortscience.org/paper?bibtexKey=journals/corr/2211.11158#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2211.11158#ngthanhtinqn Fri, 24 Feb 2023 04:50:05 +0000 2203.11876 journals/corr/2203.11876 2 Open-Vocabulary DETR with Conditional Matching ngthanhtinqn The paper proposed a new object detection method to detect novel classes by using Conditional Matching. This detector can be conditioned on either image or text, which means a user can use an image or text to let the model detect the corresponding bounding boxes in the picture. This model has 2 changes compared to other open-vocabulary detectors: 1) Other detectors rely on Region Proposal Network (RPN) which can not cover all the objects in a picture, so it will worsen the performance of detec... https://shortscience.org/paper?bibtexKey=journals/corr/2203.11876#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2203.11876#ngthanhtinqn Wed, 22 Feb 2023 17:10:43 +0000 2203.17271 journals/corr/2203.17271 2 Do Vision-Language Pretrained Models Learn Composable Primitive Concepts? ngthanhtinqn This paper proposed a way to do classification using primitive concepts such as color, shape, texture, etc. The framework is simple, they have two sub-models: (1) the first one is a trained VL model such as CLIP, ViLT, and ALBEF. The input of this step is the primitive concepts or let's say, attribute concepts and an image, then the output will be the scores for each concept. (2) the second one is a linear model that uses the concepts and their scores to do classification. This model is trai... https://shortscience.org/paper?bibtexKey=journals/corr/2203.17271#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2203.17271#ngthanhtinqn Tue, 21 Feb 2023 21:25:57 +0000 2301.12597 journals/corr/2301.12597 2 BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models ngthanhtinqn This paper has a way to leverage pre-trained Vision Language encoders to do VL tasks such as VQA, and Image Captioning. To have a good VL model, the modality gap must be reduced. In this paper, they proposed a Q-Former which is a Transformer module that is trained first with a frozen image encoder, then trained with this frozen image encoder and a frozen text encoder (from a Large Language Model). The reason why the Q-Former needs to train in two stages is: (1) Trained with frozen image enc... https://shortscience.org/paper?bibtexKey=journals/corr/2301.12597#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2301.12597#ngthanhtinqn Tue, 21 Feb 2023 06:35:40 +0000 conf/nips/ChoiGMH19 2 Why Can't I Dance in the Mall? Learning to Mitigate Scene Bias in Action Recognition ngthanhtinqn This paper is to mitigate the scene bias in the action recognition task. Scene bias is defined as the model only focusing on scene or object information without paying attention to the actual activity. To mitigate this issue, the author proposed 2 additional types of loss: (1) scene adversarial loss that helps the network to learn features that are suitable for action but invariant to scene type. Hence, reduce the scene bias. (2) human mask confusion loss that prevents a model from predicting ... https://shortscience.org/paper?bibtexKey=conf/nips/ChoiGMH19#ngthanhtinqn https://shortscience.org/paper?bibtexKey=conf/nips/ChoiGMH19#ngthanhtinqn Sun, 19 Feb 2023 21:34:33 +0000 2210.04150 journals/corr/2210.04150 2 Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP ngthanhtinqn Open-vocabulary semantic segmentation is a method to generate semantic segment regions based on text descriptions. Due to the text descriptions, this model can detect unseen objects that have not been seen in the training phase. Some works create two-stage methods to first create class-agnostic segments and then use CLIP to assign each segment to a phrase. To compute the prediction for an image, they ensemble two types of prediction scores. (1) If we want to classify a mask into $K$ classes,... https://shortscience.org/paper?bibtexKey=journals/corr/2210.04150#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2210.04150#ngthanhtinqn Fri, 17 Feb 2023 22:26:50 +0000 1802.05766 journals/corr/abs-1802-05766 2 Learning to Count Objects in Natural Images for Visual Question Answering ngthanhtinqn Visual Question Answering can not do the counting objects problem properly. So in this paper, they figured out the reason is due to the Soft Attention module, and they also proposed a module that can produce reliable counting from object proposals. There are two challenges in VQA Counting tasks: (1) There is no ground truth label for the objects to be counted. (2) The additional module should not affect performance on non-counting problems. Why Soft Attention is not good for the counting task... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1802-05766#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/abs-1802-05766#ngthanhtinqn Fri, 17 Feb 2023 07:14:51 +0000 2206.09959 journals/corr/2206.09959 2 Global Context Vision Transformers ngthanhtinqn Transformer is proposed to capture long-range information with the self-attention mechanism, but it comes with quadratic computation cost and lacks multi-resolution information. Then, Swin Transformer introduces local-window-self-attention to reduce the cost to linear w.r.t image size, shifted-window-attention to capture cross-window information and finally exploits multi-resolution information with hierarchical architecture. But shifted-window-attention struggles to capture long-range informati... https://shortscience.org/paper?bibtexKey=journals/corr/2206.09959#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2206.09959#ngthanhtinqn Thu, 16 Feb 2023 07:39:34 +0000 2301.13081 journals/corr/2301.13081 2 STAIR: Learning Sparse Text and Image Representation in Grounded Tokens ngthanhtinqn This paper aims to learn a sparse semantic representation of texts and images instead of a dense representation trained by CLIP or ALIGN. The sparse embeddings are achieved by: (1) For an input (image or text), extract it to a feature (using Transformer) $h$ where $h_{j}$ corresponds to the $jth$ word in the input. (2) Each $j$ word embedding will be transformed to $p(h_{j})$ in vocabulary space $V$ by using a mapping function (in this paper, this is BERT Masked Language Model MLM). So each ... https://shortscience.org/paper?bibtexKey=journals/corr/2301.13081#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2301.13081#ngthanhtinqn Tue, 14 Feb 2023 15:58:28 +0000 2201.05078 journals/corr/2201.05078 2 CLIP-Event: Connecting Text and Images with Event Structures ngthanhtinqn This work enforced vision-language pretraining models to comprehend events and associated argument (participant) roles. To achieve this, they created a framework including 3 steps: (1) Event structural knowledge extraction including (a) text extraction: using SOTA text information extraction system to extract events (ex: agent, entity, instrument), (b) image extraction: using Faster RCNN trained on Open Images to detect objects. (c) Primary event detection: the primary event is the event that... https://shortscience.org/paper?bibtexKey=journals/corr/2201.05078#ngthanhtinqn https://shortscience.org/paper?bibtexKey=journals/corr/2201.05078#ngthanhtinqn Wed, 08 Feb 2023 21:03:05 +0000 conf/eccv/ChenLYK0G0020 2 UNITER: UNiversal Image-TExt Representation Learning ngthanhtinqn This paper is to design a generalized multimodal architecture that can solve all Vision language tasks. Concretely, they will pre-train their model on 4 main tasks (MLM, ITM, WRA, MRM) and will evaluate various downstream tasks (VQA, VCR, NLVR). As shown in Fig 1, UNITER �first encodes image regions (visual features and bounding box features) and textual words (tokens and positions) into a common embedding space with Image Embedder and Text Embedder. Then, a Transformer module is applied to l... https://shortscience.org/paper?bibtexKey=conf/eccv/ChenLYK0G0020#ngthanhtinqn https://shortscience.org/paper?bibtexKey=conf/eccv/ChenLYK0G0020#ngthanhtinqn Mon, 06 Feb 2023 14:50:23 +0000 2105.05837 journals/corr/2105.05837 4 When Does Contrastive Visual Representation Learning Work? CodyWild This is a mildly silly paper to summarize, since there isn't really a new mechanism to understand, but rather a number of straightforward (and interesting!) empirical results that are also quite well-explained in the paper itself. That said, for the sake of a tiny bit more brevity than the paper itself provides, I'll try to pull out some of the conclusions I found the most interesting here. The general goal of this paper is to better understand the contours of when self-supervised representati... https://shortscience.org/paper?bibtexKey=journals/corr/2105.05837#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2105.05837#decodyng Wed, 24 Nov 2021 06:03:37 +0000 1911.05507 journals/corr/abs-1911-05507 4 Compressive Transformers for Long-Range Sequence Modelling CodyWild This paper is an interesting extension of earlier work, in the TransformerXL paper, that sought to give Transformers access to a "memory" beyond the scope of the subsequence where full self-attention was being performed. This was done by caching the activations from prior subsequences, and making them available to the subsequence currently being calculated in a "read-only" way, with gradients not propagated backwards. This had the effect of (1) reducing the maximum memory size compared to simply... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1911-05507#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-1911-05507#decodyng Mon, 22 Nov 2021 06:34:58 +0000 2101.03961 journals/corr/2101.03961 4 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity CodyWild The idea of the Switch Transformer is to have more parameters available for a network to use, but to only use a small subset of those parameters for each example that's run through the network. This is achieved through a routing scheme, whereby a weighting layer is applied to each token and produces a set of logits/softmax weights over the set of possible experts. The token is then sent to the expert that was given the highest weight. The network is implemented such that different experts can ac... https://shortscience.org/paper?bibtexKey=journals/corr/2101.03961#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2101.03961#decodyng Fri, 19 Nov 2021 07:12:52 +0000 1807.11626 journals/corr/abs-1807-11626 4 MnasNet: Platform-Aware Neural Architecture Search for Mobile CodyWild When machine learning models need to run on personal devices, that implies a very particular set of constraints: models need to be fairly small and low-latency when run on a limited-compute device, without much loss in accuracy. A number of human-designed architectures have been engineered to try to solve for these constraints (depthwise convolutions, inverted residual bottlenecks), but this paper's goal is to use Neural Architecture Search (NAS) to explicitly optimize the architecture against l... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1807-11626#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-1807-11626#decodyng Wed, 17 Nov 2021 02:23:03 +0000 2010.13321 journals/corr/abs-2010-13321 4 View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose CodyWild The goal of this paper is to learn a model that embeds 2D keypoints(the locations of specific key body parts in 2D space) representing a particular pose into a vector embedding where nearby points in embedding space are also nearby in 3D space. This sort of model is useful because the same 3D pose can generate a wide variety of 2D pose projections, and it can be useful to learn which apparently-distinct representations actually map to the same 3D pose. To do this, the basic approach used by th... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2010-13321#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2010-13321#decodyng Tue, 16 Nov 2021 02:15:03 +0000 1602.05629 MahMoo16Communication 4 Communication-Efficient Learning of Deep Networks from Decentralized Data CodyWild Federated learning is the problem of training a model that incorporates updates from the data of many individuals, without having direct access to that data, or having to store it. This is potentially desirable both for reasons of privacy (not wanting to have access to private data in a centralized way), and for potential benefits to transport cost when data needed to train models exists on a user's device, and would require a lot of bandwidth to transfer to a centralized server. Historically... https://shortscience.org/paper?bibtexKey=MahMoo16Communication#decodyng https://shortscience.org/paper?bibtexKey=MahMoo16Communication#decodyng Mon, 15 Nov 2021 07:19:12 +0000 2110.15349 journals/corr/2110.15349 4 Learning to Ground Multi-Agent Communication with Autoencoders CodyWild In certain classes of multi-agent cooperation games, it's useful for agents to be able to coordinate on future actions, which is an obvious use case for having a communication channel between the two players. However, prior work in multi-agent RL has shown that it's surprisingly hard to train agents that (1) consistently learn to use a communication channel in a way that is informative rather than random, and (2) if they do use communication, can come to a common grounding on the meaning of symb... https://shortscience.org/paper?bibtexKey=journals/corr/2110.15349#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2110.15349#decodyng Sat, 13 Nov 2021 06:50:28 +0000 2104.11178 journals/corr/2104.11178 4 VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text CodyWild This strikes me as a really straightforward, clever, and exciting paper that uses the supervision intrinsic in the visual, audio, and text streams of a video to train a shared multimodal model. The basic premise is: - Tokenize all three modalities into a sequence of embedding tokens. For video, split into patches, and linearly project the voxels of these patches to get a per-token representation. For audio, a similar strategy but with waveform patches. For text, the normal per-token embeddin... https://shortscience.org/paper?bibtexKey=journals/corr/2104.11178#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2104.11178#decodyng Fri, 12 Nov 2021 06:26:48 +0000 1801.04381 journals/corr/1801.04381 4 Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation CodyWild This work expands on prior techniques for designing models that can both be stored using fewer parameters, and also execute using fewer operations and less memory, both of which are key desiderata for having trained machine learning models be usable on phones and other personal devices. The main contribution of the original MobileNets paper was to introduce the idea of using "factored" decompositions of Depthwise and Pointwise convolutions, which separate the procedures of "pull information fr... https://shortscience.org/paper?bibtexKey=journals/corr/1801.04381#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/1801.04381#decodyng Thu, 11 Nov 2021 06:30:30 +0000 2003.10555 DBLP:journals/corr/abs-2003-10555 4 {ELECTRA:} Pre-training Text Encoders as Discriminators Rather Than Generators CodyWild I'm a little embarrassed that I'm only just now reading what seems like a fairly important paper from a year and a half ago, but, in my defense, March 2020 was not the best time for keeping up with the literature in a disciplined way. Anyhow, musings aside: this paper proposes an alternative training procedure for large language models, which the authors claim result in models that reach strong performance more efficiently than previous BERT, XLNet, or RoBERTa baselines. As some background con... https://shortscience.org/paper?bibtexKey=DBLP:journals/corr/abs-2003-10555#decodyng https://shortscience.org/paper?bibtexKey=DBLP:journals/corr/abs-2003-10555#decodyng Tue, 09 Nov 2021 03:53:27 +0000 2103.03206 journals/corr/2103.03206 3 Perceiver: General Perception with Iterative Attention CodyWild This new architecture out of Deepmind applies combines information extraction and bottlenecks to a traditional Transformer base to get a model that can theoretically apply self-attention to meaningfully larger input sizes than earlier architectures allowed. Currently, self-attention models are quite powerful and capable, but because attention is quadratic-in-sequence-length in both time, and, often more saliently, memory, it's infeasible to use on long sequences without some modification. This... https://shortscience.org/paper?bibtexKey=journals/corr/2103.03206#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2103.03206#decodyng Sun, 07 Nov 2021 03:18:35 +0000 2006.03236 journals/corr/abs-2006-03236 4 Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing CodyWild This was an amusingly-timed paper for me to read, because just yesterday I was listening to a different paper summary where the presenter offhandedly mentioned the idea of compressing the sequence length in Transformers through subsequent layers (the way a ConvNet does pooling to a smaller spatial dimension in the course of learning), and it made me wonder why I hadn't heard much about that as an approach. And, lo, I came on this paper in my list the next day, which does exactly that. As a ref... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-03236#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-03236#decodyng Fri, 05 Nov 2021 05:30:06 +0000 2011.12948 journals/corr/2011.12948 3 Nerfies: Deformable Neural Radiance Fields CodyWild This summary builds substantially on my summary of NERFs, so if you haven't yet read that, I recommend doing so first! The idea of a NERF is learn a neural network that represents a 3D scene, and from which you can, once the model is trained, sample an image of that scene from any desired angle. This involves structuring your neural network as a function that predicts the RGB color and density/opacity for a given point in 3D space (x, y, z), from a given viewing angle (theta, phi). With such a... https://shortscience.org/paper?bibtexKey=journals/corr/2011.12948#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2011.12948#decodyng Thu, 04 Nov 2021 03:08:28 +0000 2003.08934 mildenhall2020representing 4 NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis CodyWild This summary builds extensively on my prior summary of SIRENs, so if you haven't read that summary or the underlying paper yet, I'd recommend doing that first! At a high level, the idea of SIRENs is to use a neural network to learn a compressed, continuous representation of an image, where the neural network encodes a mapping from (x, y) to the pixel value at that location, and the image can be reconstructed (or, potentially, expanded in size) by sampling from that function across the full ran... https://shortscience.org/paper?bibtexKey=mildenhall2020representing#decodyng https://shortscience.org/paper?bibtexKey=mildenhall2020representing#decodyng Wed, 03 Nov 2021 01:19:18 +0000 2006.09661 sitzmann2020implicit 3 Implicit Neural Representations with Periodic Activation Functions CodyWild [First off, full credit that this summary is essentially a distilled-for-my-own-understanding compression of Yannic Kilcher's excellent video on the topic] I'm interested in learning more about Neural Radiance Fields (or NERFs), a recent technique for learning a representation of a scene that lets you generate multiple views from it, and a paper referenced as a useful prerequisite for that technique was SIRENs, or Sinuisodial Representation Networks. In my view, the most complex part of unders... https://shortscience.org/paper?bibtexKey=sitzmann2020implicit#decodyng https://shortscience.org/paper?bibtexKey=sitzmann2020implicit#decodyng Tue, 02 Nov 2021 06:55:52 +0000 2110.15149 journals/corr/2110.15149 2 Diversity-Driven Combination for Grammatical Error Correction Leshem Choshen Model combination\ensembling: Average ensembling is practical - but naive. Combine considering each network's strengths, much better! Moreover, let's make the networks diverse so they will have different strengths. Wenjuan Han & Hwee Tou Ng (no twitters?) #enough2skim #NLProc The basic idea is quite simple: Given some models, why would we want the average? We want to rely on each one(or group) when it is more likely to be the correct one. This was actually introduced in our previous work (as a... https://shortscience.org/paper?bibtexKey=journals/corr/2110.15149#borgr https://shortscience.org/paper?bibtexKey=journals/corr/2110.15149#borgr Mon, 01 Nov 2021 06:33:12 +0000 2108.10763 journals/corr/2108.10763 2 ComSum: Commit Messages Summarization and Meaning Preservation Leshem Choshen Huge 𝙘𝙤𝙢𝙢𝙞𝙩 𝙨𝙪𝙢𝙢𝙖𝙧𝙞𝙯𝙖𝙩𝙞𝙤𝙣 dataset The dataset cleans tons of open source projects to have only ones with high quality committing habits (e.g. large active projects with commits that are of significant length etc.) We present some ways to evaluate that the meaning was kept while summarizing, so you can go beyond ROUGE We provide a strict split that keeps some (thousand+-) repositories totally out of the training, so you can check in domai... https://shortscience.org/paper?bibtexKey=journals/corr/2108.10763#borgr https://shortscience.org/paper?bibtexKey=journals/corr/2108.10763#borgr Sun, 24 Oct 2021 09:32:17 +0000 2102.09475 journals/corr/2102.09475 3 Gifsplanation via Latent Shift: A Simple Autoencoder Approach to Counterfactual Generation for Chest X-rays Joseph Paul Cohen **Background:** The goal of this work is to indicate image features which are relevant to the prediction of a neural network and convey that information to the user by displaying a counterfactual image animation. **The Latent Shift Method:** This method works on any pretrained encoder/decoder and classifier which is differentiable. No special considerations are needed during model training. With this approach they want the exact opposite of an adversarial attack but it is using the same idea. T... https://shortscience.org/paper?bibtexKey=journals/corr/2102.09475#joecohen https://shortscience.org/paper?bibtexKey=journals/corr/2102.09475#joecohen Fri, 02 Jul 2021 17:19:32 +0000 journals/prl/BailoRJPBK18 3 Efficient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint distribution Oleksandr Bailo Keypoint detection is an important step in various tasks such as SLAM, panorama stitching, camera calibration, and more. Efficient keypoint detectors, FAST (Features from Accelerated and Segments Test) for example, would detect keypoints where a relatively high brightness change is observed in relation to surrounding pixels. Most probably, the keypoints would be located on edges, as shown below: Let's consider another image shown below. Here, while the detector is capable of detecting many keyp... https://shortscience.org/paper?bibtexKey=journals/prl/BailoRJPBK18#ukrdailo https://shortscience.org/paper?bibtexKey=journals/prl/BailoRJPBK18#ukrdailo Sun, 07 Feb 2021 10:58:53 +0000 10.1038/s41586-019-1923-7 2 Improved protein structure prediction using potentials from deep learning CodyWild In January of this year (2020), DeepMind released a model called AlphaFold, which uses convolutional networks atop sequence-based and evolutionary features to predict protein folding structure. In particular, their model was designed to predict a distribution for how far away each pair of amino acids will be from one another in the final folded structure. Given such a trained model, you can score a candidate structure according to how likely it is under the model, and - if your process for gener... https://shortscience.org/paper?bibtexKey=10.1038/s41586-019-1923-7#decodyng https://shortscience.org/paper?bibtexKey=10.1038/s41586-019-1923-7#decodyng Tue, 01 Dec 2020 02:28:52 +0000 2007.12223 journals/corr/abs-2007-12223 3 The Lottery Ticket Hypothesis for Pre-trained BERT Networks CodyWild This is an interesting paper, investigating (with a team that includes the original authors of the Lottery Ticket paper) whether the initializations that result from BERT pretraining have Lottery Ticket-esque properties with respect to their role as initializations for downstream transfer tasks. As background context, the Lottery Ticket Hypothesis came out of an observation that trained networks could be pruned to remove low-magnitude weights (according to a particular iterative pruning strate... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2007-12223#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2007-12223#decodyng Mon, 30 Nov 2020 01:54:47 +0000 1905.10295 journals/corr/abs-1905-10295 2 Learning to learn via Self-Critique Mikhail Meskhi ### Key points - Instead of just focusing on supervised learning, a self-critique and adapt network provides a unsupervised learning approach in improving the overall generalization. It does this via transductive learning by learning a label-free loss function from the validation set to improve the base model. - The SCA framework helps a learning algorithm be more robust by learning more relevant features and improve during the training phase. ### Ideas 1. Combine deep learning models with SC... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1905-10295#michaelmmeskhi https://shortscience.org/paper?bibtexKey=journals/corr/abs-1905-10295#michaelmmeskhi Sat, 28 Nov 2020 21:58:53 +0000 2006.07589 journals/corr/abs-2006-07589 2 Adversarial Self-Supervised Contrastive Learning CodyWild This a nice, compact paper testing a straightforward idea: can we use the contrastive loss structure so widespread in unsupervised learning as a framework for generating and training against adversarial examples? In the context of the adversarial examples literature, adversarial training - or, training against examples that were adversarially generated so as to minimize the loss of the model you're training - is the primary strategy used to train robust models (robust here in the sense of not be... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-07589#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-07589#decodyng Sat, 28 Nov 2020 21:00:26 +0000 2007.00224 journals/corr/2007.00224 2 Debiased Contrastive Learning CodyWild The premise of contrastive loss is that we want to push together the representations of objects that are similar, and push dissimilar representations farther apart. However, in an unlabeled setting, we don't generally have class labels to tell which images (or objects in general) are supposed to be similar or dissimilar along the axes that matter to us, so we use the shortcut of defining some transformation on a given anchor frame that gets us a frame we're confident is related enough to that an... https://shortscience.org/paper?bibtexKey=journals/corr/2007.00224#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2007.00224#decodyng Fri, 27 Nov 2020 21:00:39 +0000 2007.02835 journals/corr/abs-2007-02835 3 GROVER: Self-supervised Message Passing Transformer on Large-scale Molecular Data CodyWild Large-scale transformers on unsupervised text data have been wildly successful in recent years; arguably, the most successful single idea in the last ~3 years of machine learning. Given that, it's understandable that different domains within ML want to take their shot at seeing whether the same formula will work for them as well. This paper applies the principles of (1) transformers and (2) large-scale unlabeled data to the problem of learning informative embeddings of molecular graphs. Labeli... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2007-02835#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2007-02835#decodyng Thu, 26 Nov 2020 20:44:45 +0000 2004.02860 journals/corr/abs-2004-02860 2 Weakly-Supervised Reinforcement Learning for Controllable Behavior CodyWild I tried my best, but I'm really confused by the central methodology of this paper. Here are the things I do understand: 1. The goal of the method is to learn disentangled representations, and, specifically, to learn representations that correspond to factors of variation in the environment that are selected by humans. That means, we ask humans whether a given image is higher or lower on a particular relevant axis, and aggregate those rankings into a vector, where a particular index of the vect... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2004-02860#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2004-02860#decodyng Thu, 26 Nov 2020 04:48:23 +0000 2002.11328 yang2020rethinking 2 Rethinking Bias-Variance Trade-off for Generalization of Neural Networks CodyWild This is a really cool paper that posits a relatively simple explanation for the strange phenomena known as double descent - both the fact of seeing it in the first place, and the difficulty in robustly causing it to appear. In the classical wisdom of statistics, increasing model complexity too far will lead to increase in variance, and thus an increase in test error (or "test risk" or "empirical risk"), leading to a U-shaped test error curve as a function of model complexity. Double descent is t... https://shortscience.org/paper?bibtexKey=yang2020rethinking#decodyng https://shortscience.org/paper?bibtexKey=yang2020rethinking#decodyng Tue, 24 Nov 2020 05:26:23 +0000 2006.15134 journals/corr/2006.15134 3 Critic Regularized Regression CodyWild Offline reinforcement learning is potentially high-value thing for the machine learning community learn to do well, because there are many applications where it'd be useful to generate a learnt policy for responding to a dynamic environment, but where it'd be too unsafe or expensive to learn in an on-policy or online way, where we continually evaluate our actions in the environment to test their value. In such settings, we'd like to be able to take a batch of existing data - collected from a hum... https://shortscience.org/paper?bibtexKey=journals/corr/2006.15134#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2006.15134#decodyng Mon, 23 Nov 2020 05:52:49 +0000 2006.06936 journals/corr/abs-2006-06936 4 Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? CodyWild This paper is ultimately relatively straightforward, for all that it's embedded in the somewhat new-to-me literature around graph-based Neural Architecture Search - the problem of iterating through options to find a graph representing an optimized architecture. The authors want to understand whether in this problem, as in many others in deep learning, we can benefit from building our supervised models off of representations learned during an unsupervised pretraining step. In this case, the unsup... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-06936#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-06936#decodyng Sun, 22 Nov 2020 02:10:17 +0000 2006.12433 journals/corr/2006.12433 3 What shapes feature representations? Exploring datasets, architectures, and training CodyWild This is a nice little empirical paper that does some investigation into which features get learned during the course of neural network training. To look at this, it uses a notion of "decodability", defined as the accuracy to which you can train a linear model to predict a given conceptual feature on top of the activations/learned features at a particular layer. This idea captures the amount of information about a conceptual feature that can be extracted from a given set of activations. They wo... https://shortscience.org/paper?bibtexKey=journals/corr/2006.12433#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2006.12433#decodyng Sat, 21 Nov 2020 04:57:58 +0000 2007.01293 ren2020unlabeled 3 Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning CodyWild This paper argues that, in semi-supervised learning, it's suboptimal to use the same weight for all examples (as happens implicitly, when the unsupervised component of the loss for each example is just added together directly. Instead, it tries to learn weights for each specific data example, through a meta-learning-esque process. The form of semi-supervised learning being discussed here is label-based consistency loss, where a labeled image is augmented and run through the current version of ... https://shortscience.org/paper?bibtexKey=ren2020unlabeled#decodyng https://shortscience.org/paper?bibtexKey=ren2020unlabeled#decodyng Fri, 20 Nov 2020 04:05:54 +0000 2007.14062 journals/corr/abs-2007-14062 3 Big Bird: Transformers for Longer Sequences CodyWild Transformers - powered by self-attention mechanisms - have been a paradigm shift in NLP, and are now the standard choice for training large language models. However, while transformers do have many benefits in terms of computational constraints - most saliently, that attention between tokens can be computed in parallel, rather than needing to be evaluated sequentially like in a RNN - a major downside is their memory (and, secondarily, computational) requirements. The baseline form of self-attent... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2007-14062#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2007-14062#decodyng Thu, 19 Nov 2020 02:32:44 +0000 2006.07710 journals/corr/abs-2006-07710 3 The Pitfalls of Simplicity Bias in Neural Networks CodyWild This is an interesting paper that makes a fairly radical claim, and I haven't fully decided whether what they find is an interesting-but-rare corner case, or a more fundamental weakness in the design of neural nets. The claim is: neural nets prefer learning simple features, even if there exist complex features that are equally or more predictive, and even if that means learning a classifier with a smaller margin - where margin means "the distance between the decision boundary and the nearest-by ... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-07710#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-07710#decodyng Sun, 15 Nov 2020 22:46:11 +0000 2010.11924 journals/corr/abs-2010-11924 2 In Search of Robust Measures of Generalization CodyWild Generalization is, if not the central, then at least one of the central mysteries of deep learning. We are somehow able to able to train high-capacity, overparametrized models, that empirically have the capacity to fit to random data - meaning that they have the capacity to memorize the labeled data we give them - and which yet still manage to train functions that generalize to test data. People have tried to come up with generalization bounds - that is, bounds on the expected test error of a mo... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2010-11924#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2010-11924#decodyng Sat, 14 Nov 2020 22:31:16 +0000 2006.06882 journals/corr/abs-2006-06882 3 Rethinking Pre-training and Self-training CodyWild Occasionally, I come across results in machine learning that I'm glad exist, even if I don't fully understand them, precisely because they remind me how little we know about the complicated information architectures we're building, and what kinds of signal they can productively use. This is one such result. The paper tests a method called self-training, and compares it against the more common standard of pre-training. Pre-training works by first training your model on a different dataset, in ... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-06882#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-06882#decodyng Sat, 14 Nov 2020 05:00:22 +0000 2010.02302 journals/corr/abs-2010-02302 2 Latent World Models For Intrinsically Motivated Exploration CodyWild The thing I think is happening here: It proposes a self-supervised learning scheme (which...seems fairly basic, but okay) to generate encodings. It then trains a Latent World Model, which takes in the current state encoding, the action, and the belief state (I think just the prior RNN state?) and predicts a next state. The intrinsic reward is the difference between this and the actual encoding of the next step. (This is dependent on a particular action and resulting next obs, it seems). I don'... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2010-02302#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2010-02302#decodyng Thu, 12 Nov 2020 05:26:18 +0000 1911.09071 journals/corr/abs-1911-09071 3 Exploring the Origins and Prevalence of Texture Bias in Convolutional Neural Networks CodyWild When humans classify images, we tend to use high-level information about the shape and position of the object. However, when convolutional neural networks classify images,, they tend to use low-level, or textural, information more than high-level shape information. This paper tries to understand what factors lead to higher shape bias or texture bias. To investigate this, the authors look at three datasets with disagreeing shape and texture labels. The first is GST, or Geirhos Style Transfer. I... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1911-09071#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-1911-09071#decodyng Wed, 11 Nov 2020 07:08:22 +0000 2008.11687 journals/corr/abs-2008-11687 3 What is being transferred in transfer learning? CodyWild This is an interesting - and refreshing - paper, in that, instead of trying to go all-in on a particular theoretical point, the authors instead run a battery of empirical investigations, all centered around the question of how to explain what happens to make transfer learning work. The experiments don't all line up to support a single point, but they do illustrate different interesting facets of the transfer process. - An initial experiment tries to understand how much of the performance of fi... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2008-11687#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2008-11687#decodyng Tue, 10 Nov 2020 06:58:27 +0000 2010.12050 journals/corr/abs-2010-12050 3 Contrastive Learning with Adversarial Examples CodyWild Contrastive learning works by performing augmentations on a batch of images, and training a network to match the representations of the two augmented parts of a pair together, and push the representations of images not in a pair farther apart. Historically, these algorithms have benefitted from using stronger augmentations, which has the effect of making the two positive elements in a pair more visually distinct from one another. This paper tries to build on that success, and, beyond just using ... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2010-12050#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2010-12050#decodyng Mon, 09 Nov 2020 02:03:47 +0000 2004.11362 journals/corr/2004.11362 3 Supervised Contrastive Learning CodyWild This was a really cool-to-me paper that asked whether contrastive losses, of the kind that have found widespread success in semi-supervised domains, can add value in a supervised setting as well. In a semi-supervised context, contrastive loss works by pushing together the representations of an "anchor" data example with an augmented version of itself (which is taken as a positive or target, because the image is understood to not be substantively changed by being augmented), and pushing the repre... https://shortscience.org/paper?bibtexKey=journals/corr/2004.11362#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2004.11362#decodyng Sat, 07 Nov 2020 23:30:17 +0000 2006.10455 journals/corr/abs-2006-10455 2 What Do Neural Networks Learn When Trained With Random Labels? CodyWild This is another paper that was a bit of a personal-growth test for me to try to parse, since it's definitely heavier on analytical theory than I'm used to, but I think I've been able to get something from it, even though I'll be the first to say I didn't understand it entirely. The question of this paper is: why does it seem to be the case that training a neural network on a data distribution - but with your supervised labels randomly sampled - seems to afford some level of advantage when fine... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-10455#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-10455#decodyng Sat, 07 Nov 2020 00:15:03 +0000 2007.13916 journals/corr/abs-2007-13916 3 Demystifying Contrastive Self-Supervised Learning: Invariances, Augmentations and Dataset Biases CodyWild In the past year or so, contrastive learning has experienced widespread success, and has risen to be a dominant problem framing within self-supervised learning. The basic idea of contrastive learning is that, instead of needing human-generated labels to generate a supervised task, you instead assume that there exists some automated operation you can perform to a data element to generate another data element that, while different, should be considered still fundamentally the same, or at least mor... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2007-13916#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2007-13916#decodyng Fri, 06 Nov 2020 04:39:42 +0000 2002.00632 journals/corr/abs-2002-00632 3 Effective Diversity in Population-Based Reinforcement Learning CodyWild A central problem in the domain of reinforcement learning is how to incentivize exploration and diversity of experience, since RL agents can typically only learn from states they go to, and it can often be the case that states with high reward don't have an obvious trail of high-reward states leading to them, meaning that algorithms that are naively optimizing for reward will be relatively unlikely to discover them. One potential way to promote exploration is to train an ensemble of agents, and ... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2002-00632#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2002-00632#decodyng Wed, 04 Nov 2020 00:44:40 +0000 2007.08794 journals/corr/2007.08794 3 Discovering Reinforcement Learning Algorithms CodyWild This work attempts to use meta-learning to learn an update rule for a reinforcement learning agent. In this context, "learning an update rule" means learning the parameters of an LSTM module that takes in information about the agent's recent reward and current model and outputs two values - a scalar and a vector - that are used to update the agent's model. I'm not going to go too deep into meta-learning here, but, at a high level, meta learning methods optimize parameters governing an agent's le... https://shortscience.org/paper?bibtexKey=journals/corr/2007.08794#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2007.08794#decodyng Tue, 03 Nov 2020 05:29:13 +0000 2006.04635 journals/corr/abs-2006-04635 3 Learning to Play No-Press Diplomacy with Best Response Policy Iteration CodyWild This paper focuses on an effort by a Deepmind team to train an agent that can play the game Diplomacy - a complex, multiplayer game where players play as countries controlling units, trying to take over the map of Europe. Some relevant factors of this game, for the purposes of this paper, are: 1) All players move at the same time, which means you need to model your opponent's current move, and play a move that succeeds in expectation over that predicted move distribution. This also means that,... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-04635#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2006-04635#decodyng Mon, 02 Nov 2020 06:15:17 +0000 10.1101/2020.02.07.938852 3 Tumor Phylogeny Topology Inference via Deep Learning Gavin Gray A very simple (but impractical) discrete model of subclonal evolution would include the following events: * Division of a cell to create two cells: * **Mutation** at a location in the genome of the new cells * Cell death at a new timestep * Cell survival at a new timestep Because measurements of mutations are usually taken at one time point, this is taken to be at the end of a time series of these events, where a tiny of subset of cells are observed and a **genotype matrix** $A$ is produce... https://shortscience.org/paper?bibtexKey=10.1101/2020.02.07.938852#gngdb https://shortscience.org/paper?bibtexKey=10.1101/2020.02.07.938852#gngdb Wed, 16 Sep 2020 15:59:52 +0000 1805.08296 journals/corr/1805.08296 2 Data-Efficient Hierarchical Reinforcement Learning Felipe Martins # Keypoints - Proposes the HIerarchical Reinforcement learning with Off-policy correction (**HIRO**) algorithm. - Does not require careful task-specific design. - Generic goal representation to make it broadly applicable, without any manual design of goal spaces, primitives, or controllable dimensions. - Use of off-policy experience using a novel off-policy correction. - A two-level hierarchy architecture - A higher-level controller outputs a goal for the lower-level controller every **c** ti... https://shortscience.org/paper?bibtexKey=journals/corr/1805.08296#felipemartins https://shortscience.org/paper?bibtexKey=journals/corr/1805.08296#felipemartins Tue, 01 Sep 2020 00:38:54 +0000 10.1109/isbi45749.2020.9098686 2 Bayesian Skip-Autoencoders for Unsupervised Hyperintense Anomaly Detection in High Resolution Brain Mri Friedrich-Maximilian Weberling The reconstruction of high-fidelity resolution brain MR images is especially challenging because of the highly complex brain structure. Most promising approaches for this task are autoencoders and generative models such as Variational Autoencoders (VAE) or Generative Adversarial Networks (GAN). In Unsupervised Anomaly Detection (UAD), these architectures are only trained with images of healthy brain anatomy and not with images containing anomalies such as lesions. Therefore, processing an anomal... https://shortscience.org/paper?bibtexKey=10.1109/isbi45749.2020.9098686#fweberling1995 https://shortscience.org/paper?bibtexKey=10.1109/isbi45749.2020.9098686#fweberling1995 Mon, 31 Aug 2020 09:18:08 +0000 1809.01999 journals/corr/1809.01999 2 Recurrent World Models Facilitate Policy Evolution Paul Barde ## General Framework The take-home message is that the challenge of Reinforcement Learning for environments with high-dimensional and partial observations is learning a good representation of the environment. This means learning a sensory features extractor V to deal with the highly dimensional observation (pixels for example). But also learning a temporal representation M of the environment dynamics to deal with the partial observability. If provided with such representations, learning a contr... https://shortscience.org/paper?bibtexKey=journals/corr/1809.01999#muntermulehitch https://shortscience.org/paper?bibtexKey=journals/corr/1809.01999#muntermulehitch Mon, 27 Jul 2020 13:05:14 +0000 1907.03976 journals/corr/1907.03976 3 Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations Paul Barde ## General Framework Extends T-REX (see [summary]()) so that preferences (rankings) over demonstrations are generated automatically (back to the common IL/IRL setting where we only have access to a set of unlabeled demonstrations). Also derives some theoretical requirements and guarantees for better-than-demonstrator performance. ## Motivations * Preferences over demonstrations may be difficult to obtain in practice. * There is no theoretical understanding of the requirements that lead to out... https://shortscience.org/paper?bibtexKey=journals/corr/1907.03976#muntermulehitch https://shortscience.org/paper?bibtexKey=journals/corr/1907.03976#muntermulehitch Mon, 27 Jul 2020 02:22:27 +0000 1904.06387 journals/corr/1904.06387 2 Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations Paul Barde ## General Framework Only access to a finite set of **ranked demonstrations**. The demonstrations only contains **observations** and **do not need to be optimal** but must be (approximately) ranked from worst to best. The **reward learning part is off-line** but not the policy learning part (requires interactions with the environment). In a nutshell: learns a reward models that looks at observations. The reward model is trained to predict if a demonstration's ranking is greater than another on... https://shortscience.org/paper?bibtexKey=journals/corr/1904.06387#muntermulehitch https://shortscience.org/paper?bibtexKey=journals/corr/1904.06387#muntermulehitch Mon, 27 Jul 2020 02:18:47 +0000 10.15607/rss.2016.xii.029 2 Planning for Autonomous Cars that Leverage Effects on Human Actions Paul Barde ## General Framework *wording: car = the autonomous car, driver = the other car it is interacting with* Builds a model of an **autonomous car's influence over the behavior of an interacting driver** (human or simulated) that the autonomous car can leverage to plan more efficiently. The driver is modeled by the policy that maximizes his defined objective. In brief, a **linear reward function is learned off-line with IRL on human demonstrations** and the modeled policy takes the actions that max... https://shortscience.org/paper?bibtexKey=10.15607/rss.2016.xii.029#muntermulehitch https://shortscience.org/paper?bibtexKey=10.15607/rss.2016.xii.029#muntermulehitch Mon, 27 Jul 2020 02:14:17 +0000 1406.5979 journals/corr/1406.5979 2 Reinforcement and Imitation Learning via Interactive No-Regret Learning Paul Barde ## General Framework Really **similar to DAgger** (see [summary]()) but considers **cost-sensitive classification** ("some mistakes are worst than others": you should be more careful in imitating that particular action of the expert if failing in doing so incurs a large cost-to-go). By doing so they improve from DAgger's bound of $\epsilon_{class}uT$ where $u$ is the difference in cost-to-go (between the expert and one error followed by expert policy) to $\epsilon_{class}T$ where $\epsilon_{cla... https://shortscience.org/paper?bibtexKey=journals/corr/1406.5979#muntermulehitch https://shortscience.org/paper?bibtexKey=journals/corr/1406.5979#muntermulehitch Mon, 27 Jul 2020 02:08:30 +0000 1011.0686 journals/corr/1011.0686 2 A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning Paul Barde ## General Framework The imitation learning problem is here cast into a classification problem: label the state with the corresponding expert action. With this, you can see structured prediction (predict next label knowing your previous prediction) as a degenerated IL problem. They make the **reduction assumption** that you can make the probability of mistake $\epsilon$ as small as desired on the **training distribution** (expert or mixture). They also assume that the difference in the cost-to-g... https://shortscience.org/paper?bibtexKey=journals/corr/1011.0686#muntermulehitch https://shortscience.org/paper?bibtexKey=journals/corr/1011.0686#muntermulehitch Mon, 27 Jul 2020 01:53:35 +0000 1611.03530 journals/corr/1611.03530 2 Understanding deep learning requires rethinking generalization ANIRUDH NJ ## Summary The broad goal of this paper is to understand how a neural network learns the underlying distribution of the input data and the properties of the network that describes its generalization power. Previous literature tries to use statistical measures like Rademacher complexity, uniform stability and VC dimension to explain the generalization error of the model. These methods explain generalization in terms of the number of parameters in the model along with the applied regularizat... https://shortscience.org/paper?bibtexKey=journals/corr/1611.03530#anirudhnj https://shortscience.org/paper?bibtexKey=journals/corr/1611.03530#anirudhnj Fri, 26 Jun 2020 15:33:03 +0000 journals/af/Maymin11 2 Markets are efficient if and only if P = NP quaxton Is the market efficient? This is perhaps the most prevalent question in all of finance. While this paper does not aim to answer that question, it does frame it in an information-theoretic context. Mainly, Maymin shows that at least the weak form of the efficient market hypothesis (EMH) holds if and only if P = NP. First, he defines what efficient market means: "The weakest form of the EMH states that future prices cannot be predicted by analyzing prices from the past. Therefore, technical ana... https://shortscience.org/paper?bibtexKey=journals/af/Maymin11#jyang772 https://shortscience.org/paper?bibtexKey=journals/af/Maymin11#jyang772 Thu, 04 Jun 2020 02:53:53 +0000 conf/iclr/RendaFC20 3 Comparing Rewinding and Fine-tuning in Neural Network Pruning CodyWild This is an interestingly pragmatic paper that makes a super simple observation. Often, we may want a usable network with fewer parameters, to make our network more easily usable on small devices. It's been observed (by these same authors, in fact), that pruned networks can achieve comparable weights to their fully trained counterparts if you rewind and retrain from early in the training process, to compensate for the loss of the (not ultimately important) pruned weights. This observation has bee... https://shortscience.org/paper?bibtexKey=conf/iclr/RendaFC20#decodyng https://shortscience.org/paper?bibtexKey=conf/iclr/RendaFC20#decodyng Fri, 15 May 2020 03:18:21 +0000 2004.13649 journals/corr/2004.13649 2 Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels CodyWild One of the most notable flaws of modern model-free reinforcement learning is its sample inefficiency; where humans can learn a new task with relatively few examples, model that learn policies or value functions directly from raw data need huge amounts of data to train properly. Because the model isn't given any semantic features, it has to learn a meaningful representation from raw pixels using only the (often sparse, often noisy) signal of reward. Some past approaches have tried learning repres... https://shortscience.org/paper?bibtexKey=journals/corr/2004.13649#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2004.13649#decodyng Sun, 10 May 2020 05:46:18 +0000 1903.11981 journals/corr/abs-1903-11981 3 Regularizing Trajectory Optimization with Denoising Autoencoders Robert Müller The typical model based reinforcement learning (RL) loop consists of collecting data, training a model of the environment, using the model to do model predictive control (MPC). If however the model is wrong, for example for state-action pairs that have been barely visited, the dynamics model might be very wrong and the MPC fails as the imagined model and the reality align to longer. Boney et a. propose to tackle this with a denoising autoencoder for trajectory regularization according to the fam... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1903-11981#robertmueller https://shortscience.org/paper?bibtexKey=journals/corr/abs-1903-11981#robertmueller Thu, 07 May 2020 08:08:00 +0000 1912.05500 journals/corr/abs-1912-05500 2 What Can Learned Intrinsic Rewards Capture? CodyWild This paper out of DeepMind is an interesting synthesis of ideas out of the research areas of meta learning and intrinsic rewards. The hope for intrinsic reward structures in reinforcement learning - things like uncertainty reduction or curiosity - is that they can incentivize behavior like information-gathering and exploration, which aren't incentivized by the explicit reward in the short run, but which can lead to higher total reward in the long run. So far, intrinsic rewards have mostly been ... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1912-05500#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-1912-05500#decodyng Tue, 05 May 2020 06:22:03 +0000 conf/icml/FinnAL17 2 Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks Andrea Walter Ruggerini ## TL;DR The paper presents a model-agnostic strategy to perform few-shot learning taking advantage of prior knowledge acquired during in multitask learning. Such prior knowledge derives from priors acquired about generalized model parameters (e.g. weights or hyperparameters) during the Model Agnostic Meta-Learning (MAML) algorithm. The strategy can be applied to any algorithm trained with gradient descent (not only neural networks) being more general and perhaps effective than transfer learnin... https://shortscience.org/paper?bibtexKey=conf/icml/FinnAL17#andreaw https://shortscience.org/paper?bibtexKey=conf/icml/FinnAL17#andreaw Sun, 03 May 2020 14:29:05 +0000 2001.04451 journals/corr/2001.04451 2 Reformer: The Efficient Transformer CodyWild The Transformer architecture - which uses a structure entirely based on key-value attention mechanisms to process sequences such as text - has taken over the worlds of language modeling and NLP in the past three years. However, Transformers at the scale used for large language models have huge computational and memory requirements. This is largely driven by the fact that information at every step in the sequence (or, in the so-far-generated sequence during generation) is used to inform the rep... https://shortscience.org/paper?bibtexKey=journals/corr/2001.04451#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/2001.04451#decodyng Sun, 03 May 2020 05:14:23 +0000 1909.11655 journals/corr/abs-1909-11655 2 Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space CodyWild I found this paper a bit difficult to fully understand. Its premise, as far as I can follow, is that we may want to use genetic algorithms (GA), where we make modifications to elements in a population, and keep elements around at a rate proportional to some set of their desirable properties. In particular we might want to use this approach for constructing molecules that have properties (or predicted properties) we want. However, a downside of GA is that its easy to end up in local minima, where... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1909-11655#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-1909-11655#decodyng Fri, 01 May 2020 05:38:46 +0000 conf/nips/KumarFSTL19 3 Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction Robert Müller Kumar et al. propose an algorithm to learn in batch reinforcement learning (RL), a setting where an agent learns purely form a fixed batch of data, $B$, without any interactions with the environments. The data in the batch is collected according to a batch policy $\pi_b$. Whereas most previous methods (like BCQ) constrain the learned policy to stay close to the behavior policy, Kumar et al. propose bootstrapping error accumulation reduction (BEAR), which constrains the newly learned policy to pl... https://shortscience.org/paper?bibtexKey=conf/nips/KumarFSTL19#robertmueller https://shortscience.org/paper?bibtexKey=conf/nips/KumarFSTL19#robertmueller Thu, 30 Apr 2020 13:31:29 +0000 10.1101/2020.03.03.972133 2 AI-aided design of novel targeted covalent inhibitors against SARS-CoV-2 CodyWild This preprint is a bit rambling, and I don't know that I fully followed what it was doing, but here's my best guess: - We think it's probably the case that SARS-COV2 (COVID19) uses a protease (enzyme involved in its reproduction) that isn't available and co-optable in the human body, and is also quite similar to the comparable protease protein in the original SARS virus. Therefore, it is hoped that we might be able to take inhibitors that bind to SARS, and modify them in small ways to make t... https://shortscience.org/paper?bibtexKey=10.1101/2020.03.03.972133#decodyng https://shortscience.org/paper?bibtexKey=10.1101/2020.03.03.972133#decodyng Thu, 30 Apr 2020 04:36:33 +0000 2003.03123 journals/corr/abs-2003-03123 2 Directional Message Passing for Molecular Graphs CodyWild This paper, presented this week at ICLR 2020, builds on existing applications of message-passing Graph Neural Networks (GNN) for molecular modeling (specifically: for predicting quantum properties of molecules), and extends them by introducing a way to represent angles between atoms, rather than just distances between them, as current methods are limited to. The basic version of GNNs on molecule data works by creating features attached to atoms at each level (starting at level 0 with the eleme... https://shortscience.org/paper?bibtexKey=journals/corr/abs-2003-03123#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-2003-03123#decodyng Wed, 29 Apr 2020 03:42:52 +0000 1911.11361 journals/corr/abs-1911-11361 3 Behavior Regularized Offline Reinforcement Learning Robert Müller Wu et al. provide a framework (behavior regularized actor critic (BRAC)) which they use to empirically study the impact of different design choices in batch reinforcement learning (RL). Specific instantiations of the framework include BCQ, KL-Control and BEAR. Pure off-policy rl describes the problem of learning a policy purely from a batch $B$ of one step transitions collected with a behavior policy $\pi_b$. The setting allows for no further interactions with the environment. This learning re... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1911-11361#robertmueller https://shortscience.org/paper?bibtexKey=journals/corr/abs-1911-11361#robertmueller Mon, 27 Apr 2020 13:02:23 +0000 1908.06760 journals/corr/abs-1908-06760 2 Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction CodyWild In the last three years, Transformers, or models based entirely on attention for aggregating information from across multiple places in a sequence, have taken over the world of NLP. In this paper, the authors propose using a Transformer to learn a molecular representation, and then building a model to predict drug/target interaction on top of that learned representation. A drug/target interaction model takes in two inputs - a protein involved in a disease pathway, and a (typically small) molecul... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1908-06760#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-1908-06760#decodyng Sun, 26 Apr 2020 06:39:30 +0000 journals/iacr/BellareRRS09 2 Format-Preserving Encryption quaxton Format-preserving encryption is a deterministic encryption scheme that encrypts plaintext of some specified format into ciphertext of the same format. This has a lot of practical use cases such as storing SSN or credit card information, without having to change the underlying schematics of the database or application that stores the data. The protected data is in-differentiable from unprotected data, and still enables some analytics over it, such as with masking (ie, displaying last four digits ... https://shortscience.org/paper?bibtexKey=journals/iacr/BellareRRS09#jyang772 https://shortscience.org/paper?bibtexKey=journals/iacr/BellareRRS09#jyang772 Thu, 23 Apr 2020 22:05:16 +0000 conf/ac/Rasmussen03 4 Gaussian Processes in Machine Learning Friedrich-Maximilian Weberling In this tutorial paper, Carl E. Rasmussen gives an introduction to Gaussian Process Regression focusing on the definition, the hyperparameter learning and future research directions. A Gaussian Process is completely defined by its mean function $m(\pmb{x})$ and its covariance function (kernel) $k(\pmb{x},\pmb{x}')$. The mean function $m(\pmb{x})$ corresponds to the mean vector $\pmb{\mu}$ of a Gaussian distribution whereas the covariance function $k(\pmb{x}, \pmb{x}')$ corresponds to the covari... https://shortscience.org/paper?bibtexKey=conf/ac/Rasmussen03#fweberling1995 https://shortscience.org/paper?bibtexKey=conf/ac/Rasmussen03#fweberling1995 Tue, 21 Apr 2020 20:05:41 +0000 1903.08254 journals/corr/abs-1903-08254 3 Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables Robert Müller Rakelly et al. propose a method to do off-policy meta reinforcement learning (rl). The method achieves a 20-100x improvement on sample efficiency compared to on-policy meta rl like MAML+TRPO. The key difficulty for offline meta rl arises from the meta-learning assumption, that meta-training and meta-test time match. However during test time the policy has to explore and sees as such on-policy data which is in contrast to the off-policy data that should be used at meta-training. The key contrib... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1903-08254#robertmueller https://shortscience.org/paper?bibtexKey=journals/corr/abs-1903-08254#robertmueller Tue, 21 Apr 2020 08:39:21 +0000 10.1093/bioinformatics/bty573 2 Predicting protein–protein interactions through sequence-based deep learning CodyWild Most of the interesting mechanics within living things are mediated by interactions between proteins, making it important and useful to have good predictive models of whether proteins will interact with one another, for validating possible interaction graph structures. Prior methods for this problem - which takes as its input sequence representations of two proteins, and outputs a probability of interaction - have pursued different ideas for how to combine information from the two proteins. On... https://shortscience.org/paper?bibtexKey=10.1093/bioinformatics/bty573#decodyng https://shortscience.org/paper?bibtexKey=10.1093/bioinformatics/bty573#decodyng Tue, 21 Apr 2020 06:36:31 +0000 1906.05374 journals/corr/1906.05374 3 Meta-Learning via Learned Loss Robert Müller Bechtle et al. propose meta learning via learned loss ($ML^3$) and derive and empirically evaluate the framework on classification, regression, model-based and model-free reinforcement learning tasks. The problem is formalized as learning parameters $\Phi$ of a meta loss function $M_\phi$ that computes loss values $L_{learned} = M_{\Phi}(y, f_{\theta}(x))$. Following the outer-inner loop meta algorithm design the learned loss $L_{learned}$ is used to update the parameters of the learner in the... https://shortscience.org/paper?bibtexKey=journals/corr/1906.05374#robertmueller https://shortscience.org/paper?bibtexKey=journals/corr/1906.05374#robertmueller Mon, 20 Apr 2020 16:28:20 +0000 1802.04364 journals/corr/abs-1802-04364 2 Junction Tree Variational Autoencoder for Molecular Graph Generation CodyWild Prior to this paper, most methods that used machine learning to generate molecular blueprints did so using SMILES representations - a string format with characters representing different atoms and bond types. This preference came about because ML had existing methods for generating strings that could be built on for generating SMILES (a particular syntax of string). However, an arguably more accurate and fundamental way of representing molecules is as graphs (with atoms as nodes and bonds as edg... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1802-04364#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/abs-1802-04364#decodyng Mon, 20 Apr 2020 04:48:28 +0000 1705.10843 journals/corr/GuimaraesSFA17 2 Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models CodyWild This paper's proposed method, the cleverly named ORGAN, combines techniques from GANs and reinforcement learning to generate candidate molecular sequences that incentivize desirable properties while still remaining plausibly on-distribution. Prior papers I've read on molecular generation have by and large used approaches based in maximum likelihood estimation (MLE) - where you construct some distribution over molecular representations, and maximize the probability of your true data under that ... https://shortscience.org/paper?bibtexKey=journals/corr/GuimaraesSFA17#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/GuimaraesSFA17#decodyng Sat, 18 Apr 2020 04:57:12 +0000 journals/jcheminf/OlivecronaBEC17 2 Molecular de-novo design through deep reinforcement learning CodyWild Over the past few days, I've been reading about different generative neural networks being tried out for molecular generation. So far this has mostly focused on latent variable space models like autoencoders, but today I shifted attention to a different approach rooted in reinforcement learning. The goal of most of these methods is 1) to build a generative model that can sample plausible molecular structures, but more saliently 2) specifically generate molecules optimized to exhibit some propert... https://shortscience.org/paper?bibtexKey=journals/jcheminf/OlivecronaBEC17#decodyng https://shortscience.org/paper?bibtexKey=journals/jcheminf/OlivecronaBEC17#decodyng Fri, 17 Apr 2020 06:00:27 +0000 1908.09791 journals/corr/abs-1908-09791 2 Once for All: Train One Network and Specialize it for Efficient Deployment ameroyer **Summary**: The goal of this work is to propose a "Once-for-all” (OFA) network: a large network which is trained such that its subnetworks (subsets of the network with smaller width, convolutional kernel sizes, shallower units) are also trained towards the target task. This allows to adapt the architecture to a given budget at inference time while preserving performance. **Elastic Parameters.** The goal is to train a large architecture that contains several well-trained subnetworks with dif... https://shortscience.org/paper?bibtexKey=journals/corr/abs-1908-09791#ameroyer https://shortscience.org/paper?bibtexKey=journals/corr/abs-1908-09791#ameroyer Thu, 16 Apr 2020 17:48:55 +0000 1610.02415 journals/corr/Gomez-Bombarelli16 3 Automatic chemical design using a data-driven continuous representation of molecules CodyWild I'll admit that I found this paper a bit of a letdown to read, relative to expectations rooted in its high citation count, and my general excitement and interest to see how deep learning could be brought to bear on molecular design. But before a critique, let's first walk through the mechanics of how the authors' approach works. The method proposed is basically a very straightforward Variational Auto Encoder, or VAE. It takes in a textual SMILES string representation of a molecular structure,... https://shortscience.org/paper?bibtexKey=journals/corr/Gomez-Bombarelli16#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/Gomez-Bombarelli16#decodyng Wed, 15 Apr 2020 03:11:44 +0000 journals/iacr/BrakerskiV11 2 Efficient Fully Homomorphic Encryption from (Standard) LWE quaxton Brakerski and Vaikuntanathan introduce a fully homomorphic encryption scheme (FHE) based solely on the decisional learning with errors (LWE) security assumptions. Moving away from the relatively obscure mathematics of ideal lattices. They introduce relinearization and modulus switching techniques for dimensionality reduction and for removing the “squashing” step of Craig Gentry’s FHE scheme. BV11 and other similar schemes are commonly referred to as “Second generation FHE” schemes. R... https://shortscience.org/paper?bibtexKey=journals/iacr/BrakerskiV11#jyang772 https://shortscience.org/paper?bibtexKey=journals/iacr/BrakerskiV11#jyang772 Mon, 13 Apr 2020 02:16:23 +0000 1704.01212 journals/corr/GilmerSRVD17 4 Neural Message Passing for Quantum Chemistry CodyWild In the years before this paper came out in 2017, a number of different graph convolution architectures - which use weight-sharing and order-invariant operations to create representations at nodes in a graph that are contextualized by information in the rest of the graph - had been suggested for learning representations of molecules. The authors of this paper out of Google sought to pull all of these proposed models into a single conceptual framework, for the sake of better comparing and testing ... https://shortscience.org/paper?bibtexKey=journals/corr/GilmerSRVD17#decodyng https://shortscience.org/paper?bibtexKey=journals/corr/GilmerSRVD17#decodyng Fri, 10 Apr 2020 06:05:16 +0000 1708.09259 journals/corr/1708.09259 2 Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet hanoch kremer ScatterNets incorporates geometric knowledge of images to produce discriminative and invariant (translation and rotation) features i.e. edge information. The same outcome as CNN's first layers hold. So why not replace that first layer/s with an equivalent, fixed, structure and let the optimizer find the best weights for the CNN with its leading-edge removed. The main motivations of the idea of replacing the first convolutional, ReLU and pooling layers of the CNN with a two-layer parametric log-b... https://shortscience.org/paper?bibtexKey=journals/corr/1708.09259#hanochkremer https://shortscience.org/paper?bibtexKey=journals/corr/1708.09259#hanochkremer Thu, 09 Apr 2020 12:05:38 +0000