The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Multidimensional access methods
ACM Computing Surveys (CSUR)
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A cost model for query processing in high dimensional data spaces
ACM Transactions on Database Systems (TODS)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space
IEEE Transactions on Knowledge and Data Engineering
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
Approximate Nearest Neighbor Searching in Multimedia Databases
Proceedings of the 17th International Conference on Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Top-k Selection Queries
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Efficient similarity search and classification via rank aggregation
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A Replacement for Voronoi Diagrams of Near Linear Size
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
A Sampling-Based Estimator for Top-k Query
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Navigating nets: simple algorithms for proximity search
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
LSH forest: self-tuning indexes for similarity search
WWW '05 Proceedings of the 14th international conference on World Wide Web
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
Entropy based nearest neighbor search in high dimensions
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Nearest Neighbor Retrieval Using Distance-Based Hashing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space
ACM Transactions on Database Systems (TODS)
Similarity search and locality sensitive hashing using ternary content addressable memories
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Indexing high-dimensional data for main-memory similarity search
Information Systems
Monitoring near duplicates over video streams
Proceedings of the international conference on Multimedia
GRAMS3: an efficient framework for XML structural similarity search
DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
iPoc: a polar coordinate based indexing method for nearest neighbor search in high dimensional space
WAIM'10 Proceedings of the 11th international conference on Web-age information management
An efficient algorithm for reverse furthest neighbors query with metric index
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Proceedings of the 14th International Conference on Extending Database Technology
Flexible aggregate similarity search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Effective data co-reduction for multimedia similarity search
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient histogram-based similarity search in ultra-high dimensional space
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications: Part II
Embedding-based subsequence matching in time-series databases
ACM Transactions on Database Systems (TODS)
Fast approximate similarity search based on degree-reduced neighborhood graphs
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable kNN search on vertically stored time series
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient approximate similarity search using random projection learning
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Distributed similarity estimation using derived dimensions
The VLDB Journal — The International Journal on Very Large Data Bases
ISIS: a new approach for efficient similarity search in sparse databases
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Bayesian locality sensitive hashing for fast similarity search
Proceedings of the VLDB Endowment
SIMP: accurate and efficient near neighbor search in high dimensional spaces
Proceedings of the 15th International Conference on Extending Database Technology
An efficient algorithm for arbitrary reverse furthest neighbor queries
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Boosting multi-kernel locality-sensitive hashing for scalable image retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Real-time near-duplicate web video identification by tracking and matching of spatial features
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Sparse hashing for fast multimedia search
ACM Transactions on Information Systems (TOIS)
Distributed and Parallel Databases
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Near-duplicate video retrieval: Current research and future trends
ACM Computing Surveys (CSUR)
Locality sensitive hashing revisited: filling the gap between theory and algorithm analysis
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Approximate high-dimensional nearest neighbor queries using R-forests
Proceedings of the 17th International Database Engineering & Applications Symposium
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
Nearest neighbor (NN) search in high dimensional space is an important problem in many applications. Ideally, a practical solution (i) should be implementable in a relational database, and (ii) its query cost should grow sub-linearly with the dataset size, regardless of the data and query distributions. Despite the bulk of NN literature, no solution fulfills both requirements, except locality sensitive hashing (LSH). The existing LSH implementations are either rigorous or adhoc. Rigorous-LSH ensures good quality of query results, but requires expensive space and query cost. Although adhoc-LSH is more efficient, it abandons quality control, i.e., the neighbor it outputs can be arbitrarily bad. As a result, currently no method is able to ensure both quality and efficiency simultaneously in practice. Motivated by this, we propose a new access method called the locality sensitive B-tree (LSB-tree) that enables fast high-dimensional NN search with excellent quality. The combination of several LSB-trees leads to a structure called the LSB-forest that ensures the same result quality as rigorous-LSH, but reduces its space and query cost dramatically. The LSB-forest also outperforms adhoc-LSH, even though the latter has no quality guarantee. Besides its appealing theoretical properties, the LSB-tree itself also serves as an effective index that consumes linear space, and supports efficient updates. Our extensive experiments confirm that the LSB-tree is faster than (i) the state of the art of exact NN search by two orders of magnitude, and (ii) the best (linear-space) method of approximate retrieval by an order of magnitude, and at the same time, returns neighbors with much better quality.