Paper summaryevansuProbabilistic latent semantic indexing (PLSI) is an approach for document retrieval by modeling the joint probability model of words and documents as a mixture of independent multinomial distribution conditioned by latent semantic classes. The model is based on two independence assumption. First, the observed words and documents are assumed to be generated independently. Second, conditioned on the latent class, words are generated independently of the specific document identity. Given that the number of classes is smaller than the number of documents, each class acts as a bottleneck variable in predicting the distribution of words conditioned on documents.
Technical details
Given a word w and a document d, their joint probability distribution is model as follows.
$$P(d,w) = P(d)P(w|d), where$$
$$P(w|d) = \displaystyle\sum\_{z\in Z} P(w|z)P(z|d)$$
where $z$ denotes a latent class.
Following the likelihood principle, one determines the distributions in (1) and (2) by maximization of the log-likelihood function
$$\mathcal{L} = \displaystyle\sum\_{d \in D} \displaystyle\sum\_{w \in W} n(d,w) log P(d,w)$$
The maximization is done by the Expectation Maximization (EM) algorithm.
Results
![](http://1.bp.blogspot.com/-eSKjS0950ac/VUHFJ2Vv7fI/AAAAAAAAA4U/bMGoVNh5_O0/s1600/result.png)

Probabilistic latent semantic indexing (PLSI) is an approach for document retrieval by modeling the joint probability model of words and documents as a mixture of independent multinomial distribution conditioned by latent semantic classes. The model is based on two independence assumption. First, the observed words and documents are assumed to be generated independently. Second, conditioned on the latent class, words are generated independently of the specific document identity. Given that the number of classes is smaller than the number of documents, each class acts as a bottleneck variable in predicting the distribution of words conditioned on documents.
Technical details
Given a word w and a document d, their joint probability distribution is model as follows.
$$P(d,w) = P(d)P(w|d), where$$
$$P(w|d) = \displaystyle\sum\_{z\in Z} P(w|z)P(z|d)$$
where $z$ denotes a latent class.
Following the likelihood principle, one determines the distributions in (1) and (2) by maximization of the log-likelihood function
$$\mathcal{L} = \displaystyle\sum\_{d \in D} \displaystyle\sum\_{w \in W} n(d,w) log P(d,w)$$
The maximization is done by the Expectation Maximization (EM) algorithm.
Results
![](http://1.bp.blogspot.com/-eSKjS0950ac/VUHFJ2Vv7fI/AAAAAAAAA4U/bMGoVNh5_O0/s1600/result.png)

Your comment:

You must log in before you can post this comment!

You must log in before you can submit this summary! Your draft will not be saved!

Preview:

0

Short Science allows researchers to publish paper summaries that are voted on and ranked! About