A partitioning method for high dimensional data

Authors:
Seunghoon Lee;Sung-Woo Bang;Bo-Keong Kim;Jaekwang Kim;Jee-Hyong Lee
Affiliations:
Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea;Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea;Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea;Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea;Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea
Venue:
Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
Year:
2010

Citing 13
Cited 0

Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Some approaches to best-match file searching

Communications of the ACM
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances

The VLDB Journal — The International Journal on Very Large Data Bases
Web service composition with case-based reasoning

ADC '03 Proceedings of the 14th Australasian database conference - Volume 17
Toward High-Precision Service Retrieval

IEEE Internet Computing
Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space

Knowledge and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nearest neighbor search in high-dimensional space is an important operation in many applications, such as data mining and multimedia database. Evaluating similarities of a point to all other points in high-dimensional space need the high computational cost. For reducing the computational cost, index-structures are frequently used. Most of these index-structures are built by partitioning the data set based on a specific criterion. However, partitioning approaches potentially have a problem failing to find the nearest neighbor which is caused by disjoint partitions. In this paper, we propose an Error Minimizing Partitioning (E-MP) method with a novel tree structure, which minimizes the failure problem in finding the nearest neighbors. E-MP divides the data into subsets with considering the distribution of data set. For partitioning data set, the proposed method finds the first principal component of the data set using the principal component analysis (PCA). And then, the method finds the centroid of data set. Finally, it decides the partitioning hyper-plane that passes the centroid and is perpendicular to the principal component vector. We also make a comparative study of existing methods and the proposed method, to verify the usability of our method.