Statistical profile estimation in database systems
ACM Computing Surveys (CSUR)
On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Association rules over interval data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Histogram-Based Approximation of Set-Valued Query-Answers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Benchmarking attribute cardinality maps for database systems using the TPC-D specifications
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Optimizing queries with expensive video predicates in cloud environment
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
Most current Database Management Systems (DBMS) use histograms in their query optimization, and in approximating query result sizes. This is because they can be utilized in determining efficient query evaluation plans. All the existing methods perform poorly when the attributes of a relation are very sparsely distributed, also called the "sparse data cases". These cases are the worst-cases scenarios for attributes with skewed distributions. In this paper, we propose a novel histogram-based algorithm, namely the Averaged Rectangular Attribute Cardinality Map (Averaged R-ACM), and demonstrate its performance in estimating query result sizes for the sparse data cases. Our proposed algorithm combines the advantages of the traditional widely-used histogram-based algorithm, namely the Equi-width histogram, and a relatively new algorithm, namely the R-ACM2 introduced in [Thi99]. The goals of compacting the sparse data distribution and of obtaining accurate estimates of query result sizes are achieved by utilizing this algorithm. The superiority of this algorithm is also validated by an extensive set of experiments. And the entire set of experimental results obtained by integrating the underlying algorithm and other histogram-based algorithms into the ORACLE query optimizer can be found in [Che03].