The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Incremental distance join algorithms for spatial databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Multidimensional access methods
ACM Computing Surveys (CSUR)
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Closest pair queries in spatial databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A cost model for query processing in high dimensional data spaces
ACM Transactions on Database Systems (TODS)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space
IEEE Transactions on Knowledge and Data Engineering
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Approximate Nearest Neighbor Searching in Multimedia Databases
Proceedings of the 17th International Conference on Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Top-k Selection Queries
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Replacement for Voronoi Diagrams of Near Linear Size
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
A Sampling-Based Estimator for Top-k Query
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Navigating nets: simple algorithms for proximity search
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
Entropy based nearest neighbor search in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SFCS '75 Proceedings of the 16th Annual Symposium on Foundations of Computer Science
Enumerating the k closest pairs optimally
SFCS '92 Proceedings of the 33rd Annual Symposium on Foundations of Computer Science
Nearest Neighbor Retrieval Using Distance-Based Hashing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Matching query processing in high-dimensional space
Proceedings of the 20th ACM international conference on Information and knowledge management
Distributed similarity estimation using derived dimensions
The VLDB Journal — The International Journal on Very Large Data Bases
Locality-sensitive hashing scheme based on dynamic collision counting
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Inter-media hashing for large-scale retrieval from heterogeneous data sources
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Effective hashing for large-scale multimedia search
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Super-EGO: fast multi-dimensional similarity join
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate high-dimensional nearest neighbor queries using R-forests
Proceedings of the 17th International Database Engineering & Applications Symposium
Hi-index | 0.00 |
Nearest Neighbor (NN) search in high-dimensional space is an important problem in many applications. From the database perspective, a good solution needs to have two properties: (i) it can be easily incorporated in a relational database, and (ii) its query cost should increase sublinearly with the dataset size, regardless of the data and query distributions. Locality-Sensitive Hashing (LSH) is a well-known methodology fulfilling both requirements, but its current implementations either incur expensive space and query cost, or abandon its theoretical guarantee on the quality of query results. Motivated by this, we improve LSH by proposing an access method called the Locality-Sensitive B-tree (LSB-tree) to enable fast, accurate, high-dimensional NN search in relational databases. The combination of several LSB-trees forms a LSB-forest that has strong quality guarantees, but improves dramatically the efficiency of the previous LSH implementation having the same guarantees. In practice, the LSB-tree itself is also an effective index which consumes linear space, supports efficient updates, and provides accurate query results. In our experiments, the LSB-tree was faster than: (i) iDistance (a famous technique for exact NN search) by two orders of magnitude, and (ii) MedRank (a recent approximate method with nontrivial quality guarantees) by one order of magnitude, and meanwhile returned much better results. As a second step, we extend our LSB technique to solve another classic problem, called Closest Pair (CP) search, in high-dimensional space. The long-term challenge for this problem has been to achieve subquadratic running time at very high dimensionalities, which fails most of the existing solutions. We show that, using a LSB-forest, CP search can be accomplished in (worst-case) time significantly lower than the quadratic complexity, yet still ensuring very good quality. In practice, accurate answers can be found using just two LSB-trees, thus giving a substantial reduction in the space and running time. In our experiments, our technique was faster: (i) than distance browsing (a well-known method for solving the problem exactly) by several orders of magnitude, and (ii) than D-shift (an approximate approach with theoretical guarantees in low-dimensional space) by one order of magnitude, and at the same time, outputs better results.