On accessing data in high-dimensional spaces: a comparative study of three space partitioning strategies

  • Authors:
  • Jack Lukaszuk;Ratko Orlandic

  • Affiliations:
  • Department of Computer Science, Illinois Institute of Technology, 10 West 31 st Street, Room 236SB, Chicago, IL;Department of Computer Science, Illinois Institute of Technology, 10 West 31 st Street, Room 236SB, Chicago, IL

  • Venue:
  • Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

while experience shows that contemporary multi-dimensional access methods perform poorly in high-dimensional spaces, little is known about the underlying causes of this important problem. One of the factors that has a profound effect on the performance of a multi-dimensional structure in high-dimensional situations is its space partitioning strategy. This paper investigates the partitioning strategies of KDB-trees, the Pyramid Technique, and a new point access method called the Θs Technique. The paper reveals important dimensionality problems associated with these strategies and shows how each strategy affects the retrieval performance across a range of spaces with varying dimensionalities. The Pyramid Technique, which is frequently regarded as the state-of-the-art access method for high-dimensional data, suffers from numerous problems that become particularly severe with highly skewed data in heavily, sparse spaces. Although the partitioning strategy of KDB-trees incurs several problems in high-dimensional spaces, it exhibits a remarkable adaptability to the changing data distributions. However, the experimental evidence gathered on both simulated and real data sets shows that the Θs Technique generally outperforms the other two schemes in high-dimensional spaces, usually by a significant margin.