Approximate high-dimensional nearest neighbor queries using R-forests

Authors:
Michael Nolen;King-Ip Lin
Affiliations:
The University of Memphis, Memphis, TN;The University of Memphis, Memphis, TN
Venue:
Proceedings of the 17th International Database Engineering & Applications Symposium
Year:
2013

Citing 27
Cited 0

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Replacement for Voronoi Diagrams of Near Linear Size

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
The Priority R-tree: a practically efficient and worst-case optimal R-tree

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
CVA file: an index structure for high-dimensional datasets

Knowledge and Information Systems
Utilizing Indexes for Approximate and On-Line Nearest Neighbor Queries

IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification

ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
WeR-trees

Data & Knowledge Engineering
Interactive high-dimensional index for large Chinese calligraphic character databases

ACM Transactions on Asian Language Information Processing (TALIP)
High-dimensional descriptor indexing for large multimedia databases

Proceedings of the 17th ACM conference on Information and knowledge management
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A revised r*-tree in comparison with related index structures

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
VoR-tree: R-trees with Voronoi diagrams for efficient processing of spatial nearest neighbor queries

Proceedings of the VLDB Endowment
Bayesian locality sensitive hashing for fast similarity search

Proceedings of the VLDB Endowment
Locality-sensitive hashing scheme based on dynamic collision counting

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Highly efficient query processing on high-dimensional data, while important, is still a challenge nowadays -- as the curse of dimensionality makes efficient solution very difficult. On the other hand, there have been suggestions that it is better off if one can return a solution quickly, that is close enough, to be sufficient. In this paper we will introduce the concept R-Forest, comprised of a set of disjoint R-trees built over the domain of the search space. Each R-tree will store a sub-set of points in a non-overlapping space, which is maintained throughout the life of the forest. Also included are several new features, Median point used for ordering and searching a pruning parameter, as well as restricted access. When all of these are combined together they can be used to answer Approximate Nearest Neighbor queries, returning a result that is an improvement over alternative methods, such as Locality Sensitive Hashing B-Tree (LSB-tree) with the same amount of IO. With our approach to this difficult problem, we are able to handle different data distribution, even taking advantage of the distribution without any additional parameter tuning, scales with increasing dimensionality and most importantly provides the user with some feedback, in terms of lower bound as to the quality of the results.