Sensitivity of self-tuning histograms: query order affecting accuracy and robustness

Authors:
Andranik Khachatryan;Emmanuel Müller;Christian Stier;Klemens Böhm
Affiliations:
Institute for Program Structures and Data Organization (IPD), Karlsruhe Institute of Technology (KIT), Germany;Institute for Program Structures and Data Organization (IPD), Karlsruhe Institute of Technology (KIT), Germany;Institute for Program Structures and Data Organization (IPD), Karlsruhe Institute of Technology (KIT), Germany;Institute for Program Structures and Data Organization (IPD), Karlsruhe Institute of Technology (KIT), Germany
Venue:
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Year:
2012

Citing 17
Cited 0

Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications

ICDT '99 Proceedings of the 7th International Conference on Database Theory
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Frequent-Pattern based Iterative Projected Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A multi-dimensional histogram for selectivity estimation and fast approximate query answering

CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
ISOMER: Consistent Histogram Construction Using Query Feedback

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Multi-dimensional Histograms with Tight Bounds for the Error

IDEAS '06 Proceedings of the 10th International Database Engineering and Applications Symposium
Selectivity estimation by batch-query based histogram and parametric method

ADC '07 Proceedings of the eighteenth conference on Australasian database - Volume 63
Histograms based on the minimum description length principle

The VLDB Journal — The International Journal on Very Large Data Bases
Evaluating clustering in subspace projections of high dimensional data

Proceedings of the VLDB Endowment
Hierarchically organized skew-tolerant histograms for geographic data objects

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Efficient selectivity estimation by histogram construction based on subspace clustering

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In scientific databases, the amount and the complexity of data calls for data summarization techniques. Such summaries are used to assist fast approximate query answering or query optimization. Histograms are a prominent class of model-free data summaries and are widely used in database systems. So-called self-tuning histograms look at query-execution results to refine themselves. An assumption with such histograms is that they can learn the dataset from scratch. We show that this is not the case and highlight a major challenge that stems from this. Traditional self-tuning is overly sensitive to the order of queries, and reaches only local optima with high estimation errors. We show that a self-tuning method can be improved significantly if it starts with a carefully chosen initial configuration. We propose initialization by subspace clusters in projections of the data. This improves both accuracy and robustness of self-tuning histograms.