Optimal algorithms for evaluating rank joins in database systems

Authors:
Karl Schnaitter;Neoklis Polyzotis
Affiliations:
University of California, Santa Cruz, Santa Cruz, CA;University of California, Santa Cruz, Santa Cruz, CA
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2008

Citing 22
Cited 5

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Optimizing queries over multimedia repositories

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Minimal probing: supporting expensive predicates for top-k queries

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Eager Aggregation and Lazy Aggregation

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Supporting ad-hoc ranking aggregates

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Adaptive rank-aware query optimization in relational databases

ACM Transactions on Database Systems (TODS)
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Joining ranked inputs in practice

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Depth estimation for ranking query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Confidence-Aware Join Algorithms

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Robust and efficient algorithms for rank join evaluation

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Incrementally computing ordered answers of acyclic conjunctive queries

NGITS'06 Proceedings of the 6th international conference on Next Generation Information Technologies and Systems

Parallel data access for multiway rank joins

ICWE'11 Proceedings of the 11th international conference on Web engineering
Top-k linked data query processing

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Efficient execution of top-k SPARQL queries

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Extending SPARQL algebra to support efficient evaluation of top-k SPARQL queries

Search Computing
Efficient top-k spatial distance joins

SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the rank join problem, we are given a set of relations and a scoring function, and the goal is to return the join results with the top k scores. It is often the case in practice that the inputs may be accessed in ranked order and the scoring function is monotonic. These conditions allow for efficient algorithms that solve the rank join problem without reading all of the input. In this article, we present a thorough analysis of such rank join algorithms. A strong point of our analysis is that it is based on a more general problem statement than previous work, making it more relevant to the execution model that is employed by database systems. One of our results indicates that the well-known HRJN algorithm has shortcomings, because it does not stop reading its input as soon as possible. We find that it is NP-hard to overcome this weakness in the general case, but cases of limited query complexity are tractable. We prove the latter with an algorithm that infers provably tight bounds on the potential benefit of reading more input in order to stop as soon as possible. As a result, the algorithm achieves a cost that is within a constant factor of optimal.