Main purpose: * This work proposes a software-based resolution augmentation method which is more agile and simpler to implement than hardware engineering solutions. * The paper examines three deep learning single image super resolution techniques on pCLE images * A video-registration based method is proposed to estimate ground truth HR pCLE images (this can be assumed as the main objective of the paper) Highlights: * The papers emphasise that this is the first work to address the image resolution problem in pCLE image acquisitions * The paper introduces useful information on how pCLE devices work * Strong related work * Clear story * Comprehensive evaluation Main Idea: * Use video-registration based techniques to estimate the HR images (real ground truth HR image is not available) * Simulate LR images from estimate HR images with help of Voronoi diagram and Delaunay-based linear interpolation. * Train an Exemplar-based SR model (EBSR -- DL-based approach) to learn the mapping between simulated LR and estimate HR images. Methodology Details * To estimate the HR images, a video-registration based mosaicking techniques (by the same authors in MIA 2006) is used which fuses a collection of input images by averaging the temporal information. * Since mosaicking generates single large filed-of-view mosaic image from LR images, the mosaic-to-image diffeomorphic spatial transformation is used which results from the mosaicking process to propagate and crop the fused information from the mosaic back into each input LR image space. * At this point, the authors observe that the misalignment between input LR images (used in the video-registration based mosaicking technique) and estimate HR cause training problem for the EBSR model. So, they treat the HR images as realistic and chose to simulate LR images from them!!!! * Simulated LR images by obtained using the Voronoi diagram (averaging the Voronoi cell on HR image) + additive noise on estimate HR images. * Finally, they build to experimental datasets 1) LR_org and HR and 2) LR_synth and HR and train three CNN SR models on these twor datasets. * They train FSRCNN, EDSR, SRGAN * The networks are trained using L1+SSIM loss functions Experiment Notes: * SSIM and GCF are used to quantitatively assess the performance of the models. * A composite score is also used to take SSIM and GCF into account jointly * In the ideal case, when the models are trained and etsted on simulated LR and HR images, the quantitative results are convincing. * "From this experiment, it is possible to conclude that the proposed solution is capable of performing SR reconstruction when the models are trained on synthetic data with no domain gap at test time" * When models are trained and tested on original LR and estimate HR images, the performance is not reasonable * When the models are trained on simulated LR images and tested on original LR images, the results become better compared to the previous case, * For a solid conclusion, and MOS study was carried out. The models are trained on simulated LR images.