Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An overview of query optimization in relational systems
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Selectivity estimation in spatial databases
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Wavelet synopses with error guarantees
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Selectivity Estimation of Complex Spatial Queries
SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
Range Selectivity Estimation for Continuous Attributes
SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Selectivity estimators for multidimensional range queries over real attributes
The VLDB Journal — The International Journal on Very Large Data Bases
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Hi-index | 0.00 |
A new technique for constructing multi-dimensional histograms is proposed. This technique first invokes a density-based clustering algorithm to locate dense and sparse regions of the input data. Then the data distribution inside each of these regions is summarized by partitioning it into non-overlapping blocks laid onto a grid. The granularity of this grid is chosen depending on the underlying data distribution: the more homogeneous the data, the coarser the grid. Our approach is compared with state-of-the-art histograms on both synthetic and real-life data and is shown to be more effective.