Approximate Algorithms for Distance-Based Queries in High-Dimensional Data Spaces Using R-Trees

  • Authors:
  • Antonio Corral;Joaquín Cañadas;Michael Vassilakopoulos

  • Affiliations:
  • -;-;-

  • Venue:
  • ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In modern database applications the similarity or dissimilarity of complex objects is examined by performing distance-based queries (DBQs) on data of high dimensionality. The R-tree and its variations are commonly cited multidimensional access methods that can be used for answering such queries. Although, the related algorithms work well for low-dimensional data spaces, their performance degrades as the number of dimensions increases (dimensionality curse). In order to obtain acceptable response time in high-dimensional data spaces, algorithms that obtain approximate solutions can be used. Three approximation techniques (驴-allowance, N-consider and M-consider) and the respective recursive branch-and-bound algorithms for DBQs are presented and studied in this paper. We investigate the performance of these algorithms for the most representative DBQs (the K-nearest neighbors query and the K-closest pairs query) in high-dimensional data spaces, where the point data sets are indexed by tree-like structures belonging to the R-tree family: R*- trees and X-trees. The searching strategy is tuned according to several parameters, in order to examine the trade-off between cost (I/O activity and response time) and accuracy of the result. The outcome of the experimental evaluation is the derivation of the outperforming DBQ approximate algorithm for large high-dimensional point data sets.