Deriving predicate statistics in datalog

Authors:
Senlin Liang;Michael Kifer
Affiliations:
State University of New York at Stony Brook, Stony Brook, NY, USA;State University of New York at Stony Brook, Stony Brook, NY, USA
Venue:
Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
Year:
2010

Citing 18
Cited 3

Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Estimating the size of generalized transitive closures

VLDB '89 Proceedings of the 15th international conference on Very large data bases
On the expected size of recursive Datalog queries

PODS '91 Proceedings of the tenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Implications of certain assumptions in database performance evauation

ACM Transactions on Database Systems (TODS)
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Exploiting statistics on query expressions for optimization

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Summary Grids: Building Accurate Multidimensional Histograms

DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Database Systems: An Application Oriented Approach, Complete Version (2nd Edition)

Database Systems: An Application Oriented Approach, Complete Version (2nd Edition)
Graph-based synopses for relational selectivity estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Adding magic to an optimising datalog compiler

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Precise complexity analysis for efficient datalog queries

Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming

Precise complexity analysis for efficient datalog queries

Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
Deriving predicate statistics for logic rules

RR'12 Proceedings of the 6th international conference on Web Reasoning and Rule Systems
Non-termination analysis and cost-based query optimization of logic programs

RR'12 Proceedings of the 6th international conference on Web Reasoning and Rule Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Database query optimizers rely on data statistics in selecting query execution plans. Similar query optimization techniques are desirable for deductive databases and, to make this happen, we need to be able to collect data statistics for Datalog predicates. The difficulty is, however, that Datalog predicates can be recursive. In this paper, we propose an algorithm, called SDP, that estimates Datalog query sizes efficiently by maintaining the statistical dependency information for derived predicates. Base predicate statistics are computed and summarized using dependency matrices, and derived predicate statistics are computed by evaluating rules in an abstract way with rule bodies replaced with algebraic expressions over the dependency matrices. Recursive rules are handled by a fixed point evaluation. Our experimental study validates that: 1) SDP produces better query size estimates than using base predicate statistics and propagating them to derived predicates using the argument independence assumption; 2) the estimates largely preserve the relative order of real query sizes and thus can be used to guide cost based query optimizers.