The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A Replacement for Voronoi Diagrams of Near Linear Size
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions
SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
The Priority R-tree: a practically efficient and worst-case optimal R-tree
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
CVA file: an index structure for high-dimensional datasets
Knowledge and Information Systems
Utilizing Indexes for Approximate and On-Line Nearest Neighbor Queries
IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
Data & Knowledge Engineering
Interactive high-dimensional index for large Chinese calligraphic character databases
ACM Transactions on Asian Language Information Processing (TALIP)
High-dimensional descriptor indexing for large multimedia databases
Proceedings of the 17th ACM conference on Information and knowledge management
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A revised r*-tree in comparison with related index structures
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space
ACM Transactions on Database Systems (TODS)
VoR-tree: R-trees with Voronoi diagrams for efficient processing of spatial nearest neighbor queries
Proceedings of the VLDB Endowment
Bayesian locality sensitive hashing for fast similarity search
Proceedings of the VLDB Endowment
Locality-sensitive hashing scheme based on dynamic collision counting
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.00 |
Highly efficient query processing on high-dimensional data, while important, is still a challenge nowadays -- as the curse of dimensionality makes efficient solution very difficult. On the other hand, there have been suggestions that it is better off if one can return a solution quickly, that is close enough, to be sufficient. In this paper we will introduce the concept R-Forest, comprised of a set of disjoint R-trees built over the domain of the search space. Each R-tree will store a sub-set of points in a non-overlapping space, which is maintained throughout the life of the forest. Also included are several new features, Median point used for ordering and searching a pruning parameter, as well as restricted access. When all of these are combined together they can be used to answer Approximate Nearest Neighbor queries, returning a result that is an improvement over alternative methods, such as Locality Sensitive Hashing B-Tree (LSB-tree) with the same amount of IO. With our approach to this difficult problem, we are able to handle different data distribution, even taking advantage of the distribution without any additional parameter tuning, scales with increasing dimensionality and most importantly provides the user with some feedback, in terms of lower bound as to the quality of the results.