SASH: a self-adaptive histogram set for dynamically changing workloads

Authors:
Lipyeow Lim;Min Wang;Jeffrey Scott Vitter
Affiliations:
Dept. of Computer Science, Duke University, Durham, NC;IBM T. J. Watson Research Center, Hawthorne, NY;Purdue University, West Lafayette, IN
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 16
Cited 10

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Adaptive selectivity estimation using query feedback

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Relaxing the uniformity and independence assumptions using the concept of fractal dimension

Journal of Computer and System Sciences - Special issue on principles of database systems
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Learning in graphical models

Learning in graphical models
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Global optimization of histograms

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Efficient Stepwise Selection in Decomposable Models

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
The minimum description length principle in coding and modeling

IEEE Transactions on Information Theory

Querying about the Past, the Present, and the Future in Spatio-Temporal Databases

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Bloom histogram: path selectivity estimation for XML data with updates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Automated statistics collection in DB2 UDB

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Detecting attribute dependencies from query feedback

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Consistent histograms in the presence of distinct value counts

Proceedings of the VLDB Endowment
How to juggle columns: an entropy-based approach for table compression

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Self-adaptive statistics management for efficient query processing

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Histograms as statistical estimators for aggregate queries

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most RDBMSs maintain a set of histograms for estimating the selectivities of given queries. These selectivities are typically used for cost-based query optimization. While the problem of building an accurate histogram for a given attribute or attribute set has been well-studied, little attention has been given to the problem of building and tuning a set of histograms collectively for multidimensional queries in a self-managed manner based only on query feedback. In this paper, we present SASH, a Self-Adaptive Set of Histograms that addresses the problem of building and maintaining a set of histograms. SASH uses a novel two-phase method to automatically build and maintain itself using query feedback information only. In the online tuning phase, the current set of histograms is tuned in response to the estimation error of each query in an online manner. In the restructuring phase, a new and more accurate set of histograms replaces the current set of histograms. The new set of histograms (attribute sets and memory distribution) is found using information from a batch of query feedback. We present experimental results that show the effectiveness and accuracy of our approach.