#### Introduction * LargeVis  a technique to visualize largescale and highdimensional data in lowdimensional space. * Problem relates to both information visualization and machine learning (and data mining) domain. * [Link to the paper](https://arxiv.org/abs/1602.00370) #### tSNE * State of the art method for visualization problem. * Start by constructing a similarity structure from the data and then project the structure to 2/3 dimensional space. * An accelerated version proposed that uses a Knearest Neighbor (KNN) graph as the similarity structure. * Limitations * Computational cost of constructing KNN graph for highdimensional data. * Efficiency of graph visualization techniques breaks down for large data ($O(NlogN)$ complexity). * Parameters are sensitive to the dataset. #### LargeVis * Constructs KNN graph (in a more efficient manner as compared to tSNE). * Uses a principled probabilistic model for graph visualization. * Let us say the input is in the form of $N$ datapoints in $d$ dimensional space. ##### KNN Graph Construction * Random projection tree used for nearestneighborhood search in the highdimensional space with euclidean distance as metric of distance. * Tree is constructed by partitioning the entire space and the nodes in the tree correspond to subspaces. * For every nonleaf node of the tree, select a random hyperplane that splits the subspace (corresponding to the leaf node) into two children. * This is done till the number of nodes in the subspace reaches a threshold. * Once the tree is constructed, each data point gets assigned a leaf node and points in the subspace of the leaf node are the candidate nearest neighbors of that datapoint. * For high accuracy, a large number of trees should be constructed (which increases the computational cost). * The paper counters this bottleneck by using the idea "a neighbor of my neighbor is also my neighbor" to increase the accuracy of the constructed graph. * Basically construct an approximate KNN graph using random projection trees and then for each node, search its neighbor's neighbors as well. * Edges are assigned weights just like in tSNE. ##### Probabilistic Model For Graph Visualization * Given a pair of vertices, the probability of observing an edge between them is given as a function of the distance between the projection of the pair of vertices in the lower dimensional space. * The probability of observing an edge with weight $w$ is given as $w_{th}$ power of probability of observing an edge. * Given a weighted graph, $G$, the likelihood of the graph can be modeled as the likelihood of observed edges plus the likelihood of negative edges (vertex pairs without edges). * To simplify the objective function, some negative edges are sampled corresponding to each vertex using a noisy distribution. * The edges are sampled with probability proportional to their weight and then treated as binary edges. * The resulting objective function can be optimized using asynchronous stochastic gradient descent (very effective on sparse graphs). * The overall complexity is $O(sMN)$, $s$ is the dimension of lower dimensional space, $M$ is the number of negative samples and $N$ is the number of vertices. #### Experiments * Data is first preprocessed with embedding learning techniques like SkipGram and LINE and brought down to 100dimensional space. * One limitation is that the data is preprocessed to reduce the number of dimensions to 100. This seems to go against the claim of scaling for hundreds of dimensions. * KNN construction is fastest for LargeVis followed by random projection trees, NN Descent, and vantage point trees (used in tSNE). * LargeVis requires very few iterations to create highly accurate KNN graphs. * KNN Classifier is used to measure the accuracy of graph visualization quantitatively. * Compared to tSNE, LargeVis is much more stable with respect to learning rate across datasets. * LargeVis benefits from its linear complexity (as compared to tSNE's log linear complexity) and for default learning rate, outperforms tSNE for larger datasets.
Your comment:
