Automatic tuning of data synopses

Authors:
Arnd Christian König;Gerhard Weikum
Affiliations:
Microsoft Research, One Microsoft Way, Redmond WA;Department of Computer Science, University of the Saarland, P.O. Box 151150, 66041 Saarbrücken, Germany
Venue:
Information Systems - Special issue: Best papers from EDBT 2002
Year:
2003

Citing 29
Cited 1

Physical database design for relational databases

ACM Transactions on Database Systems (TODS)
Fractals everywhere

Fractals everywhere
An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Selectivity and cost estimation for joins based on random sampling

Journal of Computer and System Sciences
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient mid-query re-optimization of sub-optimal query execution plans

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Applications of the space-filling curves with data driven measure-preserving property

Proceedings of the second world congress on Nonlinear analysts: part 3
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Multi-dimensional selectivity estimation using compressed histogram information

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A comparison of selectivity estimators for range queries on metric attributes

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Synopsis data structures for massive data sets

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Global optimization of histograms

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A Framework for the Physical Design Problem for Data Synopses

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Optimal Histograms with Quality Guarantees

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Quality-driven Integration of Heterogenous Information Systems

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Combining Histograms and Parametric Curve Fitting for Feedback-Driven Query Result-size Estimation

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Incremental Maintenance of Approximate Histograms

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
TOPYDE: A Tool for Physical Database Design

DEXA '95 Proceedings of the 6th International Conference on Database and Expert Systems Applications
A Framework for Automating Physical Database Design

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Automating Statistics Management for Query Optimizers

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Efficient peer-to-peer semantic overlay networks based on statistical language models

P2PIR '06 Proceedings of the international workshop on Information retrieval in peer-to-peer networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maintaining statistics on multidimensional data distributions is crucial for predicting the run-time and result size of queries and data analysis tasks with acceptable accuracy. Applications of such predictions include traditional query optimization, priority management and resource scheduling for data mining tasks, as well as querying heterogeneous Web data sources with diverse information quality. To this end a plethora of techniques have been proposed for maintaining a compact data "synopsis" on a single table, ranging from variants of histograms to methods based on wavelets and other transforms. However, the fundamental question of how to reconcile the synopses for large information sources with many tables has been largely unexplored. This paper develops a general framework for reconciling the synopses on many tables, which may come from different information sources. It shows how to compute an optimal combination of synopses for a given workload and a limited amount of available memory. As the exact solution has large computational complexity, efficient heuristics are presented for limiting the search space of synopses combinations. The practicality of the approach and the accuracy of the proposed heuristics are demonstrated by experiments.