Querying high-dimensional data in single-dimensional space

  • Authors:
  • Cui Yu;Stéphane Bressan;Beng Chin Ooi;Kian-Lee Tan

  • Affiliations:
  • Department of Computer Science, Monmouth University, NJ 07764, West Long Branch, USA;Department of Computer Science, National University of Singapore, 3 Science Drive 2, 117543, Singapore;Department of Computer Science, National University of Singapore, 3 Science Drive 2, 117543, Singapore;Department of Computer Science, National University of Singapore, 3 Science Drive 2, 117543, Singapore

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new tunable index scheme, called iMinMax($\theta$), that maps points in high-dimensional spaces to single-dimensional values determined by their maximum or minimum values among all dimensions. By varying the tuning “knob”, $\theta$, we can obtain different families of iMinMax structures that are optimized for different distributions of data sets. The transformed data can then be indexed using existing single-dimensional indexing structures such as the B+-trees. Queries in the high-dimensional space have to be transformed into queries in the single-dimensional space and evaluated there. We present efficient algorithms for evaluating window queries as range queries on the single-dimensional space. We conducted an extensive performance study to evaluate the effectiveness of the proposed schemes. Our results show that iMinMax($\theta$) outperforms existing techniques, including the Pyramid scheme and VA-file, by a wide margin. We then describe how iMinMax could be used in approximate K-nearest neighbor (KNN) search, and we present a comparative study against the recently proposed iDistance, a specialized KNN indexing method.