Kirill Pevzner's profile - ShortScience.org

doi.org
sci-hub
scholar.google.com

Adversarial Training and Dilated Convolutions for Brain MRI Segmentation
Moeskops, Pim and Veta, Mitko and Lafarge, Maxime W. and Eppenhof, Koen A. J. and Pluim, Josien P. W.
Medical Image Computing and Computer Assisted Interventions Conference - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Problem
=========
Brain MRI segmentation using adversarial training approach

Dataset
======
55 T1 weighted brain MR images (35 adults and 20 elders) with respective label maps.


Contributions
==========
1. The authors suggest an adversarial loss in addition to the traditional loss.
2. The authors compare 2 Generator (Segmentor) models - Fully convolutional and dilated networks.

https://i.imgur.com/orhWhoM.png

Dilated network
------------------
Using conv layers, allows for larger receptive field with fewer trainable weights (compared to the FCN option).


However, the authors claim the adversarial loss contributes more when applying the FCN model

arxiv.org
scholar.google.com

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
Wang, Ting-Chun and Liu, Ming-Yu and Zhu, Jun-Yan and Tao, Andrew and Kautz, Jan and Catanzaro, Bryan
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

https://i.imgur.com/lM3EjK9.png
Problem
=============
Label map (semantic segmentation) to realistic image using GANs.

Contributions
=========
1. Coarse-to-fine generator
2. Multi-scale discriminator
3. Robust adversarial learning objective function


Coarse-to-fine Generator
=================
https://i.imgur.com/osEyGOj.png

G1 - Global generator

G2 - Local enhancer

Global Generator:
1. convolutional front-end
2. set of residual blocks
3. transposed convolutional back-end

A semantic label map is passed through the 3 components sequentially 


Local Enhancer:
1. convolutional front-end
2. set of residual blocks
3. transposed convolutional back-end


Training scheme:
1. Train standalone global generator 
2. Freeze global generator weights, train local enhancer
3. Fine-tune all weights together

Multi scale Discriminator
===================
https://i.imgur.com/hNP1cni.png

To allow for global context but work at higher resolution as well, several discriminators are applied at different image scales.


Robust adversarial learning objective function
===============================
https://i.imgur.com/j7CIbV3.png
* Compare original and generated images in feature space at different scales. 
* This is done to ensure more abstract resemblance, not just pixel-space resemblance.
* For feature extraction the discriminator is used.

arxiv.org
scholar.google.com

Progressive Growing of GANs for Improved Quality, Stability, and Variation
Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Contribution
------------
1. New GAN training methodology - progressively going from low-res to hi-res, adding additional layers to the model.
https://i.imgur.com/2rQcnH1.png

2. When introducing new layers during training, it is gradually faded-in using a coefficient.
https://i.imgur.com/iuVaN1H.png

3. increasing variation of generated images by counting the standard deviation in the discriminator.




Datasets
---------------------
* CELEBA
* LSUN
* CIFAR10

arxiv.org
scholar.google.com

Learning from Simulated and Unsupervised Images through Adversarial Training
Shrivastava, Ashish and Pfister, Tomas and Tuzel, Oncel and Susskind, Josh and Wang, Wenda and Webb, Russell
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Problem
--------------
Refine synthetically simulated images to look real

https://machinelearning.apple.com/images/journals/gan/real_synt_refined_gaze.png

Approach
--------------
* Generative adversarial networks

Contributions
----------
1. **Refiner** FCN that improves simulated image to realistically looking image
2. **Adversarial + Self regularization loss**
* **Adversarial loss** term = CNN that Classifies whether the image is refined or real
* **Self regularization** term = L1 distance of refiner produced image from simulated image. The distance can be either in pixel space or in feature space (to preserve gaze direction for example).


https://i.imgur.com/I4KxCzT.png

Datasets
------------
* grayscale eye images
* depth sensor hand images 


Technical Contributions
-------------------------------
1. **Local adversarial loss** - The discriminator is applied on image patches thus creating multiple "realness" metrices
https://machinelearning.apple.com/images/journals/gan/local-d.png

2. **Discriminator with history** - to avoid the refiner from going back to previously used refined images.

https://machinelearning.apple.com/images/journals/gan/history.gif

arxiv.org
arxiv-vanity.com
scholar.google.com

Deep MR to CT Synthesis using Unpaired Data
Jelmer M. Wolterink and Anna M. Dinkla and Mark H. F. Savenije and Peter R. Seevinck and Cornelis A. T. van den Berg and Ivana Isgum
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.CV
more

[link] Summary by Kirill Pevzner 6 years ago

https://i.imgur.com/vxBhb7B.png

Problem
------------
Convert MR scans to CT scans.


General Approach
----------
CycleGAN


Dataseet
-----------
Unpaired brain CT:MR images.
The dataset contains both CT and MR scans of same patient taken on the same day. 
The volumes are aligned using mutual information and contain some local minor misalignments.

Method
--------
Train the following models:
1. Syn_ct: CNN: MR -> CT
2. Syn_mr: CNN: CT -> MR
3. Dis_ct: classify real and synthetic CT images (result of Syn_ct)
4. Dis_mr: classify real and synthetic MR images. Syn_mr(Syn_ct(MR Image))) or Syn_mr(CT image)


https://i.imgur.com/GqVaskb.png

arxiv.org
arxiv-vanity.com
scholar.google.com

Auto-Conditioned LSTM Network for Extended Complex Human Motion Synthesis
Zimo Li and Yi Zhou and Shuangjiu Xiao and Chong He and Hao Li
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.LG
more

[link] Summary by Kirill Pevzner 6 years ago

Problem
----------
Motion prediction

Dataset
----------
CMU


Approach
--------------
auto-conditioned LSTM - an LSTM network that uses only fraction of the input timestamps, but all of the outputs (a little bit similar to keyframes).

https://image.ibb.co/nimSs5/acLSTM.png


Video
--------
https://www.youtube.com/watch?v=AWlpNeOzMig

doi.acm.org
sci-hub
scholar.google.com

Handwriting beautification using token means
Zitnick, C. Lawrence
ACM Special Interest Group on computer GRAPHics - 2013 via Local Bibsonomy
Keywords: dblp

2	[link] Summary by Kirill Pevzner 6 years ago Problem ---------- Make stylus input prettier by making it closer to mean shape of input. more less

arxiv.org
scholar.google.com

The Pose Knows: Video Forecasting by Generating Pose Futures
Walker, Jacob and Marino, Kenneth and Gupta, Abhinav and Hebert, Martial
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Problem
---------
Video prediction with human objects


Contribution
--------------
Instead of the common approach of predicting directly in pixel-space, use explicit knowledge of human motion space to predict the future of the video.

Approach
--------------
1. VAE to model the possible future movements of humans in the pose space
2. Conditional GAN - use pose information for to predict video in pixel space.



https://image.ibb.co/b1omVF/The_pose_knows.png

arxiv.org
scholar.google.com

Forecasting Human Dynamics from Static Images
Chao, Yu-Wei and Yang, Jimei and Price, Brian L. and Cohen, Scott and Deng, Jia
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Problem
-------------
Predict human motion from static image

http://www-personal.umich.edu/~ywchao/pictures/cvpr2017.png

Approach
----------
1. 2d pose sequence generator
2. convert 2d pose to 3d skeleton

https://image.ibb.co/eeBRxv/3D_PFNet.png

https://image.ibb.co/kERaVQ/Forecasting_Human_Dynamics_from_Static_Images_architecture.png


3 Step training strategy
-------------------------
1. Train human 2d pose extractor using annotated video with 2d joint positions
2. 3d skeleton extractor: project mocap data to 2d and use as ground truth for training the 2d->3d skeleton converter
3. Full network training


Datasets
-----------
1.  Penn Action - Annotated human pose in sports image sequences: bench_press, jumping_jacks, pull_ups...
2. MPII - human action videos with annotated single frame
3. Human3.6M - video, depth and mocap. action include: sitting, purchasing, waiting




Evaluation
-------------
On the following tasks:
1. 2D pose forecasting 
2. 3D pose recovery

arxiv.org
arxiv-vanity.com
scholar.google.com

Skeleton-aided Articulated Motion Generation
Yichao Yan and Jingwei Xu and Bingbing Ni and Xiaokang Yang
arXiv e-Print archive - 2017 via Local arXiv
Keywords: cs.CV
more

[link] Summary by Kirill Pevzner 6 years ago

Problem
---------------
Video generation of human motion given:
1. Single appearance reference image
2. Skeleton motion sequence


Datasets
-----------
* KTH - grayscale human actions
* Human3.6M - color multiview human actions


Approach
---------------
Conditional GANs.
The authors try both Stack GAN and Siamese GAN.
The later provides better result.

https://preview.ibb.co/ighxQQ/Skeleton_aided_Articulated_Motion_Generation.png

Questions
----------------
Isn't using a full sequence of human skeleton motion considered more then a "hint"?

dx.doi.org
sci-hub
scholar.google.com

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
Bogo, Federica and Kanazawa, Angjoo and Lassner, Christoph and Gehler, Peter V. and Romero, Javier and Black, Michael J.
European Conference on Computer Vision - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Problem
----------
Given an unconstrained image, estimate:
1. 3d pose of human skeleton 
2. 3d body mesh


Contributions
-----------
1. full body mesh extraction from image
2. improvement of state of the art


Datasets
-------------
1. Leeds Sports
2. HumanEva
3. Human3.6M


Approach
----------------
Consider the problem both bottom-up and top-down.
1. Bottom-up: DeepCut cnn model to fit joints 2d positions onto the image.
2. top-down: A skinned multi-person linear model (SMPL) is fitted and projected onto 2d joint positions and image.

arxiv.org
scholar.google.com

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks
Zhang, Han and Xu, Tao and Li, Hongsheng and Zhang, Shaoting and Huang, Xiaolei and Wang, Xiaogang and Metaxas, Dimitris N.
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Problem
------------
Text to image


Contributions
-----------------
* Images are more photo realistic and higher resolution then previous methods
* Stacked generative model


Approach
-------------
2 stage process:
1. Text-to-image: generates low resolution image with primitive shape and color.
2. low-to-hi-res: using low res image and text, generates hi res image. adding details and sharpening the edges.

https://pbs.twimg.com/media/Cziw6bfWgAAh3Yg.jpg


Datasets
--------------
* CUB - Birds
* Oxford-102 - Flowers


Results
--------
https://cdn-images-1.medium.com/max/1012/1*sIphVx4tqaXJxtnZNt3JWA.png


Criticism/ Questions
-------------------
* Is it possible the resulting images are replicas of images in the original dataset? To what extent does the model "hallucinate" new images?

arxiv.org
scholar.google.com

Unsupervised Learning for Physical Interaction through Video Prediction
Finn, Chelsea and Goodfellow, Ian J. and Levine, Sergey
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Problem
----------
Given a video of robot motion, predict future frames of the motion.

Dataset
-----------
1. The authors assembled a new dataset of 59,000 robot interactions involving pushing motions.
2. Human3.6m - video, depth and mocap. action include: sitting, purchasing, waiting...

Approach
------------
* Use LSTMs to "remember" previous frames.
* Predict 10 transformations from previous frame (each approach represents the transformation differently).
* Predict a mask to determine which transformation is applied to which pixel.

The authors suggest 3 models based on this approach:
1. Dynamic Neural Advection
2. Convolutional Dynamic Neural Advection
3. Spatial Transformer Predictors

arxiv.org
scholar.google.com

Deep multi-scale video prediction beyond mean square error
Mathieu, Michaël and Couprie, Camille and LeCun, Yann
arXiv e-Print archive - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by Kirill Pevzner 6 years ago

Predict frames of a video using 3 newly proposed and complementary methods:
1. Multi scale cnn
2. GAN
3. Image gradient difference loss


Datasets:
-----------
* UCF101
* Sports1M


GAN
------
Generator:
   * Input: several frames of video from dataset
   * output: next frame of video

Discriminator:
   * input: original and last frame
   * output: is the last frame from dataset or generated

Problem: Still blurry on edges on moving object.
Solution: Image gradient difference loss

Kirill Pevzner

sciscore: 2.357