Distance-based indexing for high-dimensional metric spaces
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
The Grid File: An Adaptable, Symmetric Multikey File Structure
ACM Transactions on Database Systems (TODS)
Some approaches to best-match file searching
Communications of the ACM
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances
The VLDB Journal — The International Journal on Very Large Data Bases
Web service composition with case-based reasoning
ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Toward High-Precision Service Retrieval
IEEE Internet Computing
Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space
Knowledge and Information Systems
Hi-index | 0.00 |
Nearest neighbor search in high-dimensional space is an important operation in many applications, such as data mining and multimedia database. Evaluating similarities of a point to all other points in high-dimensional space need the high computational cost. For reducing the computational cost, index-structures are frequently used. Most of these index-structures are built by partitioning the data set based on a specific criterion. However, partitioning approaches potentially have a problem failing to find the nearest neighbor which is caused by disjoint partitions. In this paper, we propose an Error Minimizing Partitioning (E-MP) method with a novel tree structure, which minimizes the failure problem in finding the nearest neighbors. E-MP divides the data into subsets with considering the distribution of data set. For partitioning data set, the proposed method finds the first principal component of the data set using the principal component analysis (PCA). And then, the method finds the centroid of data set. Finally, it decides the partitioning hyper-plane that passes the centroid and is perpendicular to the principal component vector. We also make a comparative study of existing methods and the proposed method, to verify the usability of our method.