Deriving predicate statistics in datalog

  • Authors:
  • Senlin Liang;Michael Kifer

  • Affiliations:
  • State University of New York at Stony Brook, Stony Brook, NY, USA;State University of New York at Stony Brook, Stony Brook, NY, USA

  • Venue:
  • Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Database query optimizers rely on data statistics in selecting query execution plans. Similar query optimization techniques are desirable for deductive databases and, to make this happen, we need to be able to collect data statistics for Datalog predicates. The difficulty is, however, that Datalog predicates can be recursive. In this paper, we propose an algorithm, called SDP, that estimates Datalog query sizes efficiently by maintaining the statistical dependency information for derived predicates. Base predicate statistics are computed and summarized using dependency matrices, and derived predicate statistics are computed by evaluating rules in an abstract way with rule bodies replaced with algebraic expressions over the dependency matrices. Recursive rules are handled by a fixed point evaluation. Our experimental study validates that: 1) SDP produces better query size estimates than using base predicate statistics and propagating them to derived predicates using the argument independence assumption; 2) the estimates largely preserve the relative order of real query sizes and thus can be used to guide cost based query optimizers.