A Comparison of Word Embeddings for the Biomedical Natural Language Processing A Comparison of Word Embeddings for the Biomedical Natural Language Processing
Paper summary This paper demonstrates that Word2Vec \cite{1301.3781} can extract relationships between words and produce latent representations useful for medical data. They explore this model on different datasets which yield different relationships between words. https://i.imgur.com/hSA61Zw.png The Word2Vec model works like an autoencoder that predicts the context of a word. The context of a word is composed of the surrounding words as shown below. Given the word in the center the neighboring words are predicted through a bottleneck in the autoencoder. A word has many contexts in a corpus so the model can never have 0 error. The model must minimize the reconstruction which is how it learns the latent representation. https://i.imgur.com/EMtjTHn.png Subjectively we can observe the relationship between word vectors: https://i.imgur.com/8C9EVq1.png
arxiv.org
scholar.google.com
A Comparison of Word Embeddings for the Biomedical Natural Language Processing
Yanshan Wang and Sijia Liu and Naveed Afzal and Majid Rastegar-Mojarad and Liwei Wang and Feichen Shen and Paul Kingsbury and Hongfang Liu
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.IR

more

Summary by Joseph Paul Cohen 2 weeks ago
Loading...
Your comment:


ShortScience.org allows researchers to publish paper summaries that are voted on and ranked!
About

Sponsored by: and