A partitioning method for high dimensional data

  • Authors:
  • Seunghoon Lee;Sung-Woo Bang;Bo-Keong Kim;Jaekwang Kim;Jee-Hyong Lee

  • Affiliations:
  • Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea;Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea;Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea;Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea;Sungkyunkwan University, Jangan-gu, Suwon, Gyeunggi-do, Republic of Korea

  • Venue:
  • Proceedings of the 4th International Conference on Uniquitous Information Management and Communication
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nearest neighbor search in high-dimensional space is an important operation in many applications, such as data mining and multimedia database. Evaluating similarities of a point to all other points in high-dimensional space need the high computational cost. For reducing the computational cost, index-structures are frequently used. Most of these index-structures are built by partitioning the data set based on a specific criterion. However, partitioning approaches potentially have a problem failing to find the nearest neighbor which is caused by disjoint partitions. In this paper, we propose an Error Minimizing Partitioning (E-MP) method with a novel tree structure, which minimizes the failure problem in finding the nearest neighbors. E-MP divides the data into subsets with considering the distribution of data set. For partitioning data set, the proposed method finds the first principal component of the data set using the principal component analysis (PCA). And then, the method finds the centroid of data set. Finally, it decides the partitioning hyper-plane that passes the centroid and is perpendicular to the principal component vector. We also make a comparative study of existing methods and the proposed method, to verify the usability of our method.