Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
A vector space model for automatic indexing
Communications of the ACM
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
An efficient parts-based near-duplicate and sub-image retrieval system
Proceedings of the 12th annual ACM international conference on Multimedia
Entropy based nearest neighbor search in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Very sparse random projections
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Randomized algorithms and NLP: using locality sensitive hash function for high speed noun clustering
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
The Pyramid Match Kernel: Efficient Learning with Sets of Features
The Journal of Machine Learning Research
Scalable near identical image and shot detection
Proceedings of the 6th ACM international conference on Image and video retrieval
Client-Friendly Classification over Random Hyperplane Hashes
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Scalable similarity search with optimized kernel hashing
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast locality-sensitive hashing
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic model for multimodal hash function learning
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Accelerated large scale optimization by concomitant hashing
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Active hashing and its application to image and text retrieval
Data Mining and Knowledge Discovery
Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems
Hi-index | 0.00 |
Locality Sensitive Hash functions are invaluable tools for approximate near neighbor problems in high dimensional spaces. In this work, we are focused on LSH schemes where the similarity metric is the cosine measure. The contribution of this work is a new class of locality sensitive hash functions for the cosine similarity measure based on the theory of concomitants, which arises in order statistics. Consider n i.i.d sample pairs, {(X1; Y1); (X2; Y2); : : : ;(Xn; Yn)} obtained from a bivariate distribution f(X, Y). Concomitant theory captures the relation between the order statistics of X and Y in the form of a rank distribution given by Prob(Rank(Yi)=j-Rank(Xi)=k). We exploit properties of the rank distribution towards developing a locality sensitive hash family that has excellent collision rate properties for the cosine measure. The computational cost of the basic algorithm is high for high hash lengths. We introduce several approximations based on the properties of concomitant order statistics and discrete transforms that perform almost as well, with significantly reduced computational cost. We demonstrate the practical applicability of our algorithms by using it for finding similar images in an image repository.