Robust and efficient algorithms for rank join evaluation

Authors:
Jonathan Finger;Neoklis Polyzotis
Affiliations:
University of California Santa Cruz, Santa Cruz, CA, USA;University of California Santa Cruz, Santa Cruz, CA, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 14
Cited 14

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Shooting stars in the sky: an online algorithm for skyline queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficiently answering top-k typicality queries on large databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Depth estimation for ranking query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anytime measures for top-k algorithms

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Probabilistic top-k and ranking-aggregate queries

ACM Transactions on Database Systems (TODS)
Confidence-Aware Join Algorithms

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering

Optimal algorithms for evaluating rank joins in database systems

ACM Transactions on Database Systems (TODS)
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Breaking out of the box of recommendations: from items to packages

Proceedings of the fourth ACM conference on Recommender systems
Proximity rank join

Proceedings of the VLDB Endowment
Combining approximation and relaxation in semantic web path queries

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
The rank join problem

Search computing
Proximity rank join in search computing

Search computing
Trends in rank join

Search computing
Parallel data access for multiway rank joins

ICWE'11 Proceedings of the 11th international conference on Web engineering
Proximity measures for rank join

ACM Transactions on Database Systems (TODS)
On optimality-ratio and coverage in ranking of joined search results

Distributed and Parallel Databases
Top-k linked data query processing

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
TJJE: An efficient algorithm for top-k join on massive data

Information Sciences: an International Journal
Top-k join queries: overcoming the curse of anti-correlation

Proceedings of the 17th International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the rank join problem we are given a relational join R1 x R2 and a function that assigns numeric scores to the join tuples, and the goal is to return the tuples with the highest score. This problem lies at the core of processing top-k SQL queries, and recent studies have introduced specialized operators that solve the rank join problem by accessing only a subset of the input tuples. A desirable property for such operators is instance-optimality, i.e., their I/O cost should remain within a factor of the optimal for different inputs. However, a recent theoretical study has shown that existing rank join operators are not instance-optimal even though they have been shown to perform well empirically. The same study proposed the PBRJRRoverFR operator that was proved to be instance-optimal, but its performance was not tested empirically and in fact it was hinted that its complexity can be high. Thus, the following important question is raised: Is it possible to design a rank join operator that is both instance-optimal and computationally efficient? In this paper we provide an answer to this challenging question. We perform an empirical study of PBRJRRoverFR and show that its computational cost can offset the benefits of instance-optimality. Using the insights gained by the study, we develop the novel FRPA operator that addresses the efficiency bottlenecks of PBRJRRoverFR. We prove that FRPA is instance-optimal in general and more specifically that it never performs more I/O than PBRJRRoverFR. FRPA is the first operator that possesses these properties and is thus of interest in the theoretical study of rank join operators. We further identify cases where the overhead of FRPA becomes significant, and propose the FRPA operator that automatically adapts its overhead to the characteristics of the input. An extensive experimental study validates the effectiveness of the new operators and demonstrates that they offer significant performance improvements (up to an order of magnitude) over the state-of-the-art.