[link]
#### Problem addressed: A new type of activation function #### Summary: This paper propose a new activation function that computes a Lp norm from multiple projections on an input vector. The p value can be learned from training example, and can also be different for each hidden unit. The intuition is that 1) for different datasets there may exist different optimal pvalues, so it make more sense to make p tunable; 2) allowing different unit take different pvalues can potentially make the approximation of decision boundaries more efficient and more flexible. The empirical results support these two intuitions, and achieved comparable results on three datasets. #### Novelty: A generalization of pooling but applied through channels, when the data and weight vector dot product plus bias is constrained to nonnegative case, the $L_\infty$ is equivalent to maxout unit. #### Drawbacks: Empirical performance is not very impressive, although evidence of supporting the intuition occurs. #### Datasets: MNIST, TFD, Pentomino #### Resources: http://arxiv.org/abs/1311.1780 #### Presenter: Yingbo Zhou
Your comment:
