Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network on ShortScience.org

arxiv.org
scholar.google.com

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network
Shi, Wenzhe and Caballero, Jose and Huszár, Ferenc and Totz, Johannes and Aitken, Andrew P. and Bishop, Rob and Rueckert, Daniel and Wang, Zehan
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Qure.ai 7 years ago

Sub-pixel CNN proposes a novel architecture for solving the ill-posed problem of super-resolution (SR). Super-resolution involves the recovery of a high resolution (HR) image or video from its low resolution (LR) counter part and is topic of great interest in digital image processing.

#### Background & problems with current approaches
LR images can be understood as low-pass filtered versions of HR images. A key assumption that underlies
many SR techniques is that much of the high-frequency spatial data is redundant and thus can be accurately reconstructed from low frequency components.

There are 2 kinds of SR techniques - one which assumes multiple LR images as different instances of HR images and uses CV techniques like registration and fusion to construct HR images. Apart from constraints, of requiring multiple images, they are inaccurate as well. The other one single image super-resolution (SISR) techniques learn implicit redundancy that is present in natural data to recover missing HR information from a single LR instance.

Among SISR techniques, the ones which deployed deep learning, most methods would tend to upsample LR images (using bicubic interpolation, with learnable filters) and learn the filters from the upsampled image. The problem with the methods are, normally creasing the resolution of the LR images before the image enhancement step increases the computational complexity. This is especially problematic for convolutional networks, where the processing speed directly depends on the input image resolution. Secondly, interpolation methods typically used to accomplish the task, such as bicubic interpolation
do not bring additional information to solve the ill-posed reconstruction problem.

Also, among other techniques like deconvolution, the pixels are filled with zero values between pixels, hence hese zero values have no gradient information that can be backpropagated through.

#### Innovation
In sub-pixel CNN authors propose an architecture (ESPCNN) that learns all the filters in 2 convolutional layers with the resolutions in LR space. Only in the last layer, a convolutional layer is implemented which transforms into HR space, using sub-pixel convolutional layer. The layer, in order to tide over problems from deconvolution, uses something known as Periodic Shuffling (PS) operator for upsampling. The idea behind periodic shuffling is derived from the fact that activations between pixel shouldn't be zero for upsamling. The kernel in sub-pixel CNN does a rearrangement following the equation
![PSoperator](https://i.imgur.com/OJtHL3w.png)
The image explains the architecture of Sub-pixel CNN. The colouring in the last layer explains the methodology for PS
![ESPCNN_structure](https://i.imgur.com/py0vceQ.png)

#### Results
The ESPCNN was trained and tested on ImageNet along with other databases . As is evident from the image, ESPCNN performs significantly better than Super-Resolution Convolutional Neural Network (SRCNN) & TRNN, which are currently the best performing approach published, as of date of publication of ESPCNN.
![ESPCNN_results](https://i.imgur.com/hOEjSeE.png)

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private