Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)Shrivastava, Anshumali and 0001, Ping Li2014
Paper summarynipsreviewsThis paper generalizes the LSH method to account for the (bounded) lengths of the data base vectors, so that the LSH tricks for fast approximate nearest neighbor search can exploit the well-known relation between Euclidian distance and dot product similarity (e.g. as in equation 2) and support MIPS search as well. They give 3 motivating examples where solving MIPS vs kNN per se is more appropriate and needed. Their algorithm is essentially equation 9 (using equation 7 compute vector reformulations $Q(q)$ and $P(x)$ of the query a database element respectively). This is based on apparently novel observation (equation 8) that the distance from the query converges to the dot product plus a constant, when a parameter m which exponentiated the $P(x)$ vector elements is sufficiently large (e.g. just 3 is claimed to suffice, leading to vectors $Q(q)$ and $P(x)$ which are just that m times larger than the original input dimensionality.
This paper generalizes the LSH method to account for the (bounded) lengths of the data base vectors, so that the LSH tricks for fast approximate nearest neighbor search can exploit the well-known relation between Euclidian distance and dot product similarity (e.g. as in equation 2) and support MIPS search as well. They give 3 motivating examples where solving MIPS vs kNN per se is more appropriate and needed. Their algorithm is essentially equation 9 (using equation 7 compute vector reformulations $Q(q)$ and $P(x)$ of the query a database element respectively). This is based on apparently novel observation (equation 8) that the distance from the query converges to the dot product plus a constant, when a parameter m which exponentiated the $P(x)$ vector elements is sufficiently large (e.g. just 3 is claimed to suffice, leading to vectors $Q(q)$ and $P(x)$ which are just that m times larger than the original input dimensionality.