Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Vector quantization and signal compression
Vector quantization and signal compression
Approximate closest-point queries in high dimensions
Information Processing Letters
An algorithm for approximate closest-point queries
SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The nature of statistical learning theory
The nature of statistical learning theory
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Computational geometry: algorithms and applications
Computational geometry: algorithms and applications
Approximate nearest neighbor queries revisited
SCG '97 Proceedings of the thirteenth annual symposium on Computational geometry
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Journal of the ACM (JACM)
Approximate nearest neighbor queries in fixed dimensions
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Balanced aspect ratio trees: combining the advantages of k-d trees and octrees
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
An optimal algorithm for approximate nearest neighbor searching
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
An Algorithm for Finding Best Matches in Logarithmic Expected Time
ACM Transactions on Mathematical Software (TOMS)
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using the Distance Distribution for Approximate Similarity Queries in High-Dimensional Metric Spaces
DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications
A Fast Nearest Neighbor Method Using Empirical Marginal Distribution
KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part II
Hi-index | 0.00 |
In nearest neighbor searching we are given a set of n data points in real d-dimensional space, Rd, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. In this paper we consider a novel approach to nearest neighbor searching, in which the search returns the correct nearest neighbor with a given probability assuming that the queries are drawn from some known distribution. The query distribution is represented by providing a set of training query points at preprocessing time. The data structure, called the overlapped split tree, is an augmented BSP-tree in which each node is associated with a cover region, which is used to determine whether the search should visit this node. We use principal component analysis and support vector machines to analyze the structure of the data and training points in order to better adapt the tree structure to the data sets. We show empirically that this new approach provides improved predictability over the kd-tree in average query performance.