Fast similarity join for multi-dimensional data

  • Authors:
  • Dmitri V. Kalashnikov;Sunil Prabhakar

  • Affiliations:
  • Department of Computer Science, University of California, Irvine, 4300 Calit2 Building, Irvine, CA 92697, USA;Department of Computer Sciences, Purdue University, 250 N. University Street, West Lafayette, IN 47907, USA

  • Venue:
  • Information Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the execution of high-dimensional joins over large amounts of disk-based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. In this paper, we develop two new in-memory spatial join algorithms, the Grid-join and EGO*-join, and study their performance. Through evaluation, we explore the domain of applicability of each approach and provide recommendations for the choice of a join algorithm depending upon the dimensionality of the data as well as the expected selectivity of the join. We show that the two new proposed join techniques substantially outperform the state-of-the-art join algorithm, the EGO-join.