Depth estimation for ranking query optimization

Authors:
Karl Schnaitter;Joshua Spiegel;Neoklis Polyzotis
Affiliations:
UC Santa Cruz;BEA Systems, Inc.;UC Santa Cruz
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 16
Cited 13

On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Efficient sampling strategies for relational database operations

ICDT Selected papers of the 4th international conference on Database theory
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Approximate medians and other quantiles in one pass and with limited memory

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Rank-aware query optimization

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Supporting top-k join queries in relational databases

The VLDB Journal — The International Journal on Very Large Data Bases
RankSQL: query algebra and optimization for relational top-k queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Robust Cardinality and Cost Estimation for Skyline Operator

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Graph-based synopses for relational selectivity estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Adaptive rank-aware query optimization in relational databases

ACM Transactions on Database Systems (TODS)

Evaluating rank joins with optimal cost

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Retune: Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Ranking objects based on relationships and fixed associations

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Robust and efficient algorithms for rank join evaluation

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents

Information Sciences: an International Journal
Optimal algorithms for evaluating rank joins in database systems

ACM Transactions on Database Systems (TODS)
Efficient processing of exact top-k queries over disk-resident sorted lists

The VLDB Journal — The International Journal on Very Large Data Bases
Proximity rank join

Proceedings of the VLDB Endowment
The rank join problem

Search computing
Proximity rank join in search computing

Search computing
Sharing work in keyword search over databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Proximity measures for rank join

ACM Transactions on Database Systems (TODS)
Chapter 11: rank-join algorithms for search computing

Search Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A relational ranking query uses a scoring function to limit the results of a conventional query to a small number of the most relevant answers. The increasing popularity of this query paradigm has led to the introduction of specialized rank join operators that integrate the selection of top tuples with join processing. These operators access just "enough" of the input in order to generate just "enough" output and can offer significant speed-ups for query evaluation. The number of input tuples that an operator accesses is called the input depth of the operator, and this is the driving cost factor in rank join processing. This introduces the important problem of depth estimation, which is crucial for the costing of rank join operators during query compilation and thus for their integration in optimized physical plans. We introduce an estimation methodology, termed Deep, for approximating the input depths of rank join operators in a physical execution plan. At the core of Deep lies a general, principled framework that formalizes depth computation in terms of the joint distribution of scores in the base tables. This framework results in a systematic estimation methodology that takes the characteristics of the data directly into account and thus enables more accurate estimates. We develop novel estimation algorithms that provide an efficient realization of the formal Deep framework, and describe their integration on top of the statistics module of an existing query optimizer. We validate the performance of Deep with an extensive experimental study on data sets of varying characteristics. The results verify the effectiveness of Deep as an estimation method and demonstrate its advantages over previously proposed techniques.