Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper summary A NLP paper. > "conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks" ## Evaluation * 1 billion word language modeling benchmark * 100 billion word google news corpus
arxiv.org
arxiv-sanity.com
scholar.google.com
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc Le and Geoffrey Hinton and Jeff Dean
arXiv e-Print archive - 2017 via arXiv
Keywords: cs.LG, cs.CL, cs.NE, stat.ML

more

Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About