Robust and efficient algorithms for rank join evaluation

  • Authors:
  • Jonathan Finger;Neoklis Polyzotis

  • Affiliations:
  • University of California Santa Cruz, Santa Cruz, CA, USA;University of California Santa Cruz, Santa Cruz, CA, USA

  • Venue:
  • Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the rank join problem we are given a relational join R1 x R2 and a function that assigns numeric scores to the join tuples, and the goal is to return the tuples with the highest score. This problem lies at the core of processing top-k SQL queries, and recent studies have introduced specialized operators that solve the rank join problem by accessing only a subset of the input tuples. A desirable property for such operators is instance-optimality, i.e., their I/O cost should remain within a factor of the optimal for different inputs. However, a recent theoretical study has shown that existing rank join operators are not instance-optimal even though they have been shown to perform well empirically. The same study proposed the PBRJRRoverFR operator that was proved to be instance-optimal, but its performance was not tested empirically and in fact it was hinted that its complexity can be high. Thus, the following important question is raised: Is it possible to design a rank join operator that is both instance-optimal and computationally efficient? In this paper we provide an answer to this challenging question. We perform an empirical study of PBRJRRoverFR and show that its computational cost can offset the benefits of instance-optimality. Using the insights gained by the study, we develop the novel FRPA operator that addresses the efficiency bottlenecks of PBRJRRoverFR. We prove that FRPA is instance-optimal in general and more specifically that it never performs more I/O than PBRJRRoverFR. FRPA is the first operator that possesses these properties and is thus of interest in the theoretical study of rank join operators. We further identify cases where the overhead of FRPA becomes significant, and propose the FRPA operator that automatically adapts its overhead to the characteristics of the input. An extensive experimental study validates the effectiveness of the new operators and demonstrates that they offer significant performance improvements (up to an order of magnitude) over the state-of-the-art.