Efficient nearest neighbor query based on extended B+-tree in high-dimensional space

Authors:
Jiangtao Cui;Zhiyong An;Yong Guo;Shuisheng Zhou
Affiliations:
School of Computer Science and Technology, Xidian University, Xi'an 710071, China;College of Computer Science and Technology, Shandong Institute of Business and Technology, Yantai 264005, China;School of Computer Science and Technology, Xidian University, Xi'an 710071, China;Department of Mathematics, School of Science, Xidian University, Xi'an 710071, China
Venue:
Pattern Recognition Letters
Year:
2010

Citing 13
Cited 2

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Supporting similarity queries in MARS

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Towards effective indexing for very large video sequence database

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Efficient high-dimensional indexing by sorting principal component

Pattern Recognition Letters

Selecting training points for one-class support vector machines

Pattern Recognition Letters
Neighbors' distribution property and sample reduction for support vector machines

Applied Soft Computing

Quantified Score

Hi-index	0.10

Visualization

Abstract

Nearest neighbor queries in high-dimensional space are important in various applications. One-dimensional mapping is an efficient indexing method to speed up the k-nearest neighbor search, which can transform a high-dimensional point into a single-dimensional value indexed by a B^+-tree. In this paper, we present a new one-dimensional indexing scheme based on extended B^+-tree for k-nearest neighbor search in high-dimensional space. We first partition the high-dimensional dataset and perform Principal Component Analysis on each partition. The distance of each point to the center of the partition is indexed using a B^+-tree, and the projection on the first principal component of each point is embedded into leaf node of the B^+-tree. In the query, a new filter strategy according to the spatial relationship between the query point and the axis determined by the first principal component is applied to improve the query performance. We also present a novel k-nearest neighbor search algorithm which can guarantee the accuracy of query results. Extensive experiments have been indicative of the effectiveness of our approach.