This paper is based on an intriguing idea of combining state-space models (SSMs) and recurrent neural networks (RNNs). Ideally, it is very much needed: For the sequences which have distinct structure and high variability, probabilistic modelling is a big problem. The handcrafted and parameterised feature representations are widely used to ease the problem and it is customary to develop the probabilistic model on top of these extracted features from the signal (such as using a short-time Fourier transform and _then_ probabilistic modelling over these features). But machines should be able to handle learning the representation part as well. So the story of the paper. Here, it is termed neural network with stochastic layers but one can safely say that the model is a state-space model with a deterministic neural network layer which is tied to both hidden variables and observations. Still, due to the complicated nonlinearities, it is not easy to understand what's going on. One thing I found missing from the paper is the exact form of nonlinearity used for the NN layer as I am looking at it from the perspective of probabilistic modelling. But this is probably because of space reasons.