Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Optimal aggregation algorithms for middleware
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 17th International Conference on Data Engineering
Supporting Incremental Join Queries on Ranked Inputs
Proceedings of the 27th International Conference on Very Large Data Bases
Supporting top-k join queries in relational databases
The VLDB Journal — The International Journal on Very Large Data Bases
RankSQL: query algebra and optimization for relational top-k queries
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient top-k aggregation of ranked inputs
ACM Transactions on Database Systems (TODS)
Shooting stars in the sky: an online algorithm for skyline queries
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficiently answering top-k typicality queries on large databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Depth estimation for ranking query optimization
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Anytime measures for top-k algorithms
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Evaluating rank joins with optimal cost
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Probabilistic top-k and ranking-aggregate queries
ACM Transactions on Database Systems (TODS)
Confidence-Aware Join Algorithms
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Optimal algorithms for evaluating rank joins in database systems
ACM Transactions on Database Systems (TODS)
Efficient processing of exact top-k queries over disk-resident sorted lists
The VLDB Journal — The International Journal on Very Large Data Bases
Breaking out of the box of recommendations: from items to packages
Proceedings of the fourth ACM conference on Recommender systems
Proceedings of the VLDB Endowment
Combining approximation and relaxation in semantic web path queries
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Search computing
Proximity rank join in search computing
Search computing
Search computing
Parallel data access for multiway rank joins
ICWE'11 Proceedings of the 11th international conference on Web engineering
Proximity measures for rank join
ACM Transactions on Database Systems (TODS)
On optimality-ratio and coverage in ranking of joined search results
Distributed and Parallel Databases
Top-k linked data query processing
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
TJJE: An efficient algorithm for top-k join on massive data
Information Sciences: an International Journal
Top-k join queries: overcoming the curse of anti-correlation
Proceedings of the 17th International Database Engineering & Applications Symposium
Hi-index | 0.00 |
In the rank join problem we are given a relational join R1 x R2 and a function that assigns numeric scores to the join tuples, and the goal is to return the tuples with the highest score. This problem lies at the core of processing top-k SQL queries, and recent studies have introduced specialized operators that solve the rank join problem by accessing only a subset of the input tuples. A desirable property for such operators is instance-optimality, i.e., their I/O cost should remain within a factor of the optimal for different inputs. However, a recent theoretical study has shown that existing rank join operators are not instance-optimal even though they have been shown to perform well empirically. The same study proposed the PBRJRRoverFR operator that was proved to be instance-optimal, but its performance was not tested empirically and in fact it was hinted that its complexity can be high. Thus, the following important question is raised: Is it possible to design a rank join operator that is both instance-optimal and computationally efficient? In this paper we provide an answer to this challenging question. We perform an empirical study of PBRJRRoverFR and show that its computational cost can offset the benefits of instance-optimality. Using the insights gained by the study, we develop the novel FRPA operator that addresses the efficiency bottlenecks of PBRJRRoverFR. We prove that FRPA is instance-optimal in general and more specifically that it never performs more I/O than PBRJRRoverFR. FRPA is the first operator that possesses these properties and is thus of interest in the theoretical study of rank join operators. We further identify cases where the overhead of FRPA becomes significant, and propose the FRPA operator that automatically adapts its overhead to the characteristics of the input. An extensive experimental study validates the effectiveness of the new operators and demonstrates that they offer significant performance improvements (up to an order of magnitude) over the state-of-the-art.