Towards Reverse-Engineering Black-Box Neural Networks Towards Reverse-Engineering Black-Box Neural Networks
Paper summary Oh et al. propose two different approaches for whitening black box neural networks, i.e. predicting details of their internals such as architecture or training procedure. In particular, they consider attributes regarding architecture (activation function, dropout, max pooling, kernel size of convolutional layers, number of convolutionaly/fully connected layers etc.), attributes concerning optimization (batch size and optimization algorithm) and attributes regarding the data (data split and size). In order to create a dataset of models, they trained roughly 11k models on MNIST; they ensured that these models have at least 98% accuracy on the validation set and they also consider ensembles. For predicting model attributes, they propose two models, called kennen-o and kennen-i, see Figure 1. Kennen-o takes as input a set of $100$ predictions of the models (i.e. final probability distributions) and tries to directly learn the attributes using a MLP of two fully connected layers. Kennen-i instead crafts a single input which allows to reason about a specific model attribute. An example for kennen-i is shown in Figure 2. In experiments, they demonstrate that both models are able to predict model attributes significantly better than chance. For details, I refer to the paper. Figure 1: Illustration of the two proposed approaches, kennen-o (top) and kennen-i (bottom). Figure 2: Illustration of the images created by kennen-i to classify different attributes. See the paper for details. Also view this summary at [](
Towards Reverse-Engineering Black-Box Neural Networks
Seong Joon Oh and Max Augustin and Bernt Schiele and Mario Fritz
arXiv e-Print archive - 2017 via Local arXiv
Keywords: stat.ML, cs.CR, cs.CV, cs.LG


Summary by David Stutz 1 month ago
Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: and