Bayesian Skip-Autoencoders for Unsupervised Hyperintense Anomaly Detection in High Resolution Brain Mri

Christoph Baur and Benedikt Wiestler and Shadi Albarqouni and Nassir Navab

2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) - 2020 via Local CrossRef

Keywords:

Christoph Baur and Benedikt Wiestler and Shadi Albarqouni and Nassir Navab

2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) - 2020 via Local CrossRef

Keywords:

[link]
The reconstruction of high-fidelity resolution brain MR images is especially challenging because of the highly complex brain structure. Most promising approaches for this task are autoencoders and generative models such as Variational Autoencoders (VAE) or Generative Adversarial Networks (GAN). In Unsupervised Anomaly Detection (UAD), these architectures are only trained with images of healthy brain anatomy and not with images containing anomalies such as lesions. Therefore, processing an anomalous input image $\pmb{x}$ with an architecture trained to reconstruct healthy MR images should produce a high reconstruction error $r$ indicating that the input MR image is likely to exhibit anomalies. So far developed models for this task are either limited to low resolution reconstructions so that quite a lot of information is lost or to only small image regions. Baur et al. hypothesize that these restrictions are caused by the low dimensionality of the latent space of the models even though a high capacity would be necessary to reconstruct highly complex brain MR images. Thus, Baur et al. introduce skip connections in the autoencoder to enable detailed high resolution reconstructions due to the enhanced gradient flow. In addition, the application of dropout on the skip connections shall prevent the model from learning the identity of the input image since only the corresponding healthy anatomy of the possibly anomalous input image shall be reconstructed. Sampling the networks parameter $\pmb{\theta}$ from a Bernoulli distribution $\pmb{\theta} \sim \mathcal{B}(p)$ with $p$ being the dropout rate also turns the Skip-Autoencoder into a Bayesian neural network by using a Monte Carlo dropout at test time. The proposed Skip Autoencoder and the Bayesian Skip-Autoencoder are tested on Mutiple Sclerosis and Glioblastoma datasets and evaluated using the Precision-Recall-Curve (PRC), the AUPRC, and the Dice Score. An investigation of the Bernoulli dropout yields that no input identity is learnt independent on the dropout. A random weight initialization seems to be sufficient. Considering the skip-connections, the best performance is achieved by a single skip-connection close to the bottleneck layer. Applying more skip connections improves the performance on the Glioblastoma test set but reduces the performance on the Multiple Sclerosis test set. In general, the non-Bayesian Skip-Autoencoder exceeds the performance of the Bayesian Skip-Autoencoder because in the calculation of the reconstruction residual $r = \rvert \pmb{x} - \pmb{\hat{x}} \rvert$ for the Bayesian Skip-Autoencoder the predictive mean of $n$ Monte Carlo samples is applied as reconstructed image $\pmb{\hat{x}}$ which causes a blurriness in the reconstructions $\pmb{\hat{x}}$. However, the Bayesian Skip-Autoencoder enables to quantify the epistemic uncertainty because the pixels of hyperintense anomalies have a low variance compared to normal tissue. |

Gaussian Processes in Machine Learning

Rasmussen, Carl Edward

Springer Advanced Lectures on Machine Learning - 2003 via Local Bibsonomy

Keywords: dblp

Rasmussen, Carl Edward

Springer Advanced Lectures on Machine Learning - 2003 via Local Bibsonomy

Keywords: dblp

[link]
In this tutorial paper, Carl E. Rasmussen gives an introduction to Gaussian Process Regression focusing on the definition, the hyperparameter learning and future research directions. A Gaussian Process is completely defined by its mean function $m(\pmb{x})$ and its covariance function (kernel) $k(\pmb{x},\pmb{x}')$. The mean function $m(\pmb{x})$ corresponds to the mean vector $\pmb{\mu}$ of a Gaussian distribution whereas the covariance function $k(\pmb{x}, \pmb{x}')$ corresponds to the covariance matrix $\pmb{\Sigma}$. Thus, a Gaussian Process $f \sim \mathcal{GP}\left(m(\pmb{x}), k(\pmb{x}, \pmb{x}')\right)$ is a generalization of a Gaussian distribution over vectors to a distribution over functions. A random function vector $\pmb{\mathrm{f}}$ can be generated by a Gaussian Process through the following procedure: 1. Compute the components $\mu_i$ of the mean vector $\pmb{\mu}$ for each input $\pmb{x}_i$ using the mean function $m(\pmb{x})$ 2. Compute the components $\Sigma_{ij}$ of the covariance matrix $\pmb{\Sigma}$ using the covariance function $k(\pmb{x}, \pmb{x}')$ 3. A function vector $\pmb{\mathrm{f}} = [f(\pmb{x}_1), \dots, f(\pmb{x}_n)]^T$ can be drawn from the Gaussian distribution $\pmb{\mathrm{f}} \sim \mathcal{N}\left(\pmb{\mu}, \pmb{\Sigma} \right)$ Applying this procedure to regression, means that the resulting function vector $\pmb{\mathrm{f}}$ shall be drawn in a way that a function vector $\pmb{\mathrm{f}}$ is rejected if it does not comply with the training data $\mathcal{D}$. This is achieved by conditioning the distribution on the training data $\mathcal{D}$ yielding the posterior Gaussian Process $f \rvert \mathcal{D} \sim \mathcal{GP}(m_D(\pmb{x}), k_D(\pmb{x},\pmb{x}'))$ for noise-free observations with the posterior mean function $m_D(\pmb{x}) = m(\pmb{x}) + \pmb{\Sigma}(\pmb{X},\pmb{x})^T \pmb{\Sigma}^{-1}(\pmb{\mathrm{f}} - \pmb{\mathrm{m}})$ and the posterior covariance function $k_D(\pmb{x},\pmb{x}')=k(\pmb{x},\pmb{x}') - \pmb{\Sigma}(\pmb{X}, \pmb{x}')$ with $\pmb{\Sigma}(\pmb{X},\pmb{x})$ being a vector of covariances between every training case of $\pmb{X}$ and $\pmb{x}$. Noisy observations $y(\pmb{x}) = f(\pmb{x}) + \epsilon$ with $\epsilon \sim \mathcal{N}(0,\sigma_n^2)$ can be taken into account with a second Gaussian Process with mean $m$ and covariance function $k$ resulting in $f \sim \mathcal{GP}(m,k)$ and $y \sim \mathcal{GP}(m, k + \sigma_n^2\delta_{ii'})$. The figure illustrates the cases of noisy observations (variance at training points) and of noise-free observationshttps://i.imgur.com/BWvsB7T.png (no variance at training points). In the Machine Learning perspective, the mean and the covariance function are parametrised by hyperparameters and provide thus a way to include prior knowledge e.g. knowing that the mean function is a second order polynomial. To find the optimal hyperparameters $\pmb{\theta}$, 1. determine the log marginal likelihood $L= \mathrm{log}(p(\pmb{y} \rvert \pmb{x}, \pmb{\theta}))$, 2. take the first partial derivatives of $L$ w.r.t. the hyperparameters, and 3. apply an optimization algorithm. It should be noted that a regularization term is not necessary for the log marginal likelihood $L$ because it already contains a complexity penalty term. Also, the tradeoff between data-fit and penalty is performed automatically. Gaussian Processes provide a very flexible way for finding a suitable regression model. However, they require the high computational complexity $\mathcal{O}(n^3)$ due to the inversion of the covariance matrix. In addition, the generalization of Gaussian Processes to non-Gaussian likelihoods remains complicated. |

About