Similarity Join for Low-and High-Dimensional Data

  • Authors:
  • Dmitri V. Kalashnikov;Sunil Prabhakar

  • Affiliations:
  • -;-

  • Venue:
  • DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
  • Year:
  • 2003

Quantified Score

Hi-index 0.03

Visualization

Abstract

The efficient processing of similarity joins is importantfor a large class of applications. The dimensionality of thedata for these applications ranges from low to high. Mostexisting methods have focussed on the execution of high-dimensional joins over large amount of disk-based data.The increasing sizes of main memory available on currentcomputers, and the need for efficient processing of patialjoins suggest that spatial joins for a large class of problemscan be processed in main memory. In this paper we developtwo new spatial join algorithms, the Grid-join and EGO*-join, and study their performance in comparison to the stateof the art algorithm EGO-join and the RSJ algorithm.Through evaluation we explore the domain of applicability of each algorithm and provide recommendations for thechoice of join algorithm depending upon the dimensionality of the data as well as the critical \varepsilon parameter. We alsopoint out the significance of the choice of this parameter forensuring that the electivity achieved is reasonable.