Evaluating rank joins with optimal cost

Authors:
Karl Schnaitter;Neoklis Polyzotis
Affiliations:
UC Santa Cruz, Santa Cruz, CA, USA;UC Santa Cruz, Santa Cruz, CA, USA
Venue:
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2008

Citing 8
Cited 24

Combining fuzzy information from multiple systems

Journal of Computer and System Sciences
Supporting Incremental Join Queries on Ranked Inputs

Proceedings of the 27th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Adaptive rank-aware query optimization in relational databases

ACM Transactions on Database Systems (TODS)
Efficient top-k aggregation of ranked inputs

ACM Transactions on Database Systems (TODS)
Depth estimation for ranking query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

Ranking objects based on relationships and fixed associations

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Robust and efficient algorithms for rank join evaluation

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Optimal algorithms for evaluating rank joins in database systems

ACM Transactions on Database Systems (TODS)
Probabilistic ranking over relations

Proceedings of the 13th International Conference on Extending Database Technology
Finding maximum degrees in hidden bipartite graphs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Computing the top-k maximal answers in a join of ranked lists

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Proximity rank join

Proceedings of the VLDB Endowment
Building ranked mashups of unstructured sources with uncertain information

Proceedings of the VLDB Endowment
The rank join problem

Search computing
Proximity rank join in search computing

Search computing
Uncertainty in rank join

Search computing
Skyline query processing over joins

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Sharing work in keyword search over databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Proximity measures for rank join

ACM Transactions on Database Systems (TODS)
Chapter 11: rank-join algorithms for search computing

Search Computing
Exact and approximate algorithms for the most connected vertex problem

ACM Transactions on Database Systems (TODS)
On the complexity of package recommendation problems

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Top-k bounded diversification

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
On optimality-ratio and coverage in ranking of joined search results

Distributed and Parallel Databases
TJJE: An efficient algorithm for top-k join on massive data

Information Sciences: an International Journal
Determining the relative accuracy of attributes

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Shortlisting top-K assignments

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Top-k diversity queries over bounded regions

ACM Transactions on Database Systems (TODS)
On the complexity of query result diversification

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the rank join problem, we are given a set of relations and a scoring function, and the goal is to return the join results with the top K scores. It is often the case in practice that the inputs may be accessed in ranked order and the scoring function is monotonic. These conditions allow for efficient algorithms that solve the rank join problem without reading all of the input. In this paper, we present a thorough analysis of such rank join algorithms. A strong point of our analysis is that it is based on a more general problem statement than previous work, making it more relevant to the execution model that is employed by database systems. One of our results indicates that the well known HRJN algorithm has shortcomings, because it does not stop reading its input as soon as possible. We find that it is NP-hard to overcome this weakness in the general case, but cases of limited query complexity are tractable. We prove the latter with an algorithm that infers provably tight bounds on the potential benefit of reading more input in order to stop as soon as possible. As a result, the algorithm achieves a cost that is within a constant factor of optimal.