A new histogram method for sparse attributes: the averaged rectangular attribute cardinality map

Authors:
B. John Oommen;Jing Chen
Affiliations:
Carleton University, Ottawa/ Canada;Carleton University, Ottawa/ Canada
Venue:
ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Year:
2003

Citing 13
Cited 1

Statistical profile estimation in database systems

ACM Computing Surveys (CSUR)
On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Association rules over interval data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Query Result Size Estimation Using a Novel Histogram-like Technique: The Rectangular Attribute Cardinality Map

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Benchmarking attribute cardinality maps for database systems using the TPC-D specifications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Optimizing queries with expensive video predicates in cloud environment

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most current Database Management Systems (DBMS) use histograms in their query optimization, and in approximating query result sizes. This is because they can be utilized in determining efficient query evaluation plans. All the existing methods perform poorly when the attributes of a relation are very sparsely distributed, also called the "sparse data cases". These cases are the worst-cases scenarios for attributes with skewed distributions. In this paper, we propose a novel histogram-based algorithm, namely the Averaged Rectangular Attribute Cardinality Map (Averaged R-ACM), and demonstrate its performance in estimating query result sizes for the sparse data cases. Our proposed algorithm combines the advantages of the traditional widely-used histogram-based algorithm, namely the Equi-width histogram, and a relatively new algorithm, namely the R-ACM2 introduced in [Thi99]. The goals of compacting the sparse data distribution and of obtaining accurate estimates of query result sizes are achieved by utilizing this algorithm. The superiority of this algorithm is also validated by an extensive set of experiments. And the entire set of experimental results obtained by integrating the underlying algorithm and other histogram-based algorithms into the ORACLE query optimizer can be found in [Che03].