General Database Statistics Using Entropy Maximization

Authors:
Raghav Kaushik;Christopher Ré;Dan Suciu
Affiliations:
Microsoft Research,;University of Washington, Seattle;University of Washington, Seattle
Venue:
DBPL '09 Proceedings of the 12th International Symposium on Database Programming Languages
Year:
2009

Citing 13
Cited 1

On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
From statistical knowledge bases to degrees of belief

Artificial Intelligence
Tracking join and self-join sizes in limited storage

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Consistently estimating the selectivity of conjuncts of predicates

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Answering queries from statistics and probabilistic views

VLDB '05 Proceedings of the 31st international conference on Very large data bases
ISOMER: Consistent Histogram Construction Using Query Feedback

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Extended wavelets for multiple measures

ACM Transactions on Database Systems (TODS)
Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Query Evaluation on a Database Given by a Random Graph

Theory of Computing Systems
Diagnosing Estimation Errors in Page Counts Using Execution Feedback

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Asymptotic conditional probabilities for conjunctive queries

ICDT'05 Proceedings of the 10th international conference on Database Theory

Understanding cardinality estimation using entropy maximization

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a framework in which query sizes can be estimated from arbitrary statistical assertions on the data. In its most general form, a statistical assertion states that the size of the output of a conjunctive query over the data is a given number. A very simple example is a histogram, which makes assertions about the sizes of the output of several range queries. Our model also allows much more complex assertions that include joins and projections. To model such complex statistical assertions we propose to use the Entropy-Maximization (EM) probability distribution. In this model any set of statistics that is consistent has a precise semantics, and every query has an precise size estimate. We show that several classes of statistics can be solved in closed form.