RandomOut: Using a convolutional gradient norm to win The Filter LotteryRandomOut: Using a convolutional gradient norm to win The Filter LotteryCohen, Joseph Paul and Lo, Henry Z. and Ding, Wei2016
Paper summaryopenreviewThe paper introduces a heuristic which aims to revive "dead" units in neural networks with the ReLU activation. In such networks, units that are less useful may be abandoned during training because they no longer receive any gradient. This wastes capacity. The proposed heuristic is to detect when this happens and to reinitialize the units in question, so they get another shot at learning something useful.
The paper proposes an approach to re-set convolutional filters that are apparently not being trained well (and randomly reinitialize them) It proposes a criterion that is based on the gradients propagated to these filters.
Basically they observe a pattern they call The Filter Lottery (TFL) where the random seed causes a high variance in the training accuracy:
![](http://i.imgur.com/5rWig0H.png)
They use the convolutional gradient norm ($CGN$) \cite{conf/fgr/LoC015} to determine how much impact a filter has on the overall classification loss function by taking the derivative of the loss function with respect each weight in the filter.
$$CGN(k) = \sum_{i} \left|\frac{\partial L}{\partial w^k_i}\right|$$
They use the CGN to evaluate the impact of a filter on error, and re-initialize filters when the gradient norm of its weights falls below a specific threshold.
The paper introduces a heuristic which aims to revive "dead" units in neural networks with the ReLU activation. In such networks, units that are less useful may be abandoned during training because they no longer receive any gradient. This wastes capacity. The proposed heuristic is to detect when this happens and to reinitialize the units in question, so they get another shot at learning something useful.
The paper proposes an approach to re-set convolutional filters that are apparently not being trained well (and randomly reinitialize them) It proposes a criterion that is based on the gradients propagated to these filters.