Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Statistical profile estimation in database systems
ACM Computing Surveys (CSUR)
On the propagation of errors in the size of join results
SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Implications of certain assumptions in database performance evauation
ACM Transactions on Database Systems (TODS)
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Universality of Serial Histograms
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Automating Statistics Management for Query Optimizers
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Executing SQL over encrypted data in the database-service-provider model
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Automatic tuning of data synopses
Information Systems - Special issue: Best papers from EDBT 2002
Supporting Efficient Parametric Search of E-Commerce Data: A Loosely-Coupled Solution
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
A Framework for the Physical Design Problem for Data Synopses
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
On Linear-Spline Based Histograms
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
A multi-dimensional histogram for selectivity estimation and fast approximate query answering
CASCON '03 Proceedings of the 2003 conference of the Centre for Advanced Studies on Collaborative research
Hierarchical binary histograms for summarizing multi-dimensional data
Proceedings of the 2005 ACM symposium on Applied computing
Error minimization in approximate range aggregates
Data & Knowledge Engineering
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient Process of Top-k Range-Sum Queries over Multiple Streams with Minimized Global Error
IEEE Transactions on Knowledge and Data Engineering
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
SASH: a self-adaptive histogram set for dynamically changing workloads
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Discovering gis sources on the web using summaries
Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Enhancing histograms by tree-like bucket indices
The VLDB Journal — The International Journal on Very Large Data Bases
Compressed hierarchical binary histograms for summarizing multi-dimensional data
Knowledge and Information Systems
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Enabling OLAP in mobile environments via intelligent data cube compression techniques
Journal of Intelligent Information Systems
Fast and effective histogram construction
Proceedings of the 18th ACM conference on Information and knowledge management
Synopses for probabilistic data over large domains
Proceedings of the 14th International Conference on Extending Database Technology
A quad-tree based multiresolution approach for two-dimensional summary data
Information Systems
Self-adaptive statistics management for efficient query processing
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Synopses reconciliation via calibration in the τ-synopses system
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Hi-index | 0.00 |
Histograms are frequently used to represent the distribution of data values in an attribute of a relation. Most previous work has focused on identifying the optimal histogram (given a limited number of buckets) for a single attribute independent of other attributes/histograms. In this paper, we propose the idea of global optimization of histograms, i.e., single-attribute histograms for a set of attributes are optimized collectively so as to minimize the overall error in using the histograms. The idea is to allocate more buckets to histograms whose attributes are more frequently used and/or distributions are highly skewed. While the accuracy of some histograms is penalized (being assigned fewer buckets), we expect the global error to be low compared to the traditional method (of allocating equal number of buckets to each histogram).We propose two algorithms to determine the histograms to construct for a collection of attributes. The first is based on dynamic programming, and the second is a greedy algorithm. We compare the overall error of these algorithms against the traditional method. Extensive experiments are conducted and the results confirm the benefits of global optimal histograms in reducing the overall error. The extent of improvement depends on the data and query distributions, ranging from no benefit when there is no significant differences in the data distributions to over a factor of 100 reduction in error in some cases we tried.The time to compute global optimal histogram using dynamic programming is much longer than the time to compute optimal histograms separately for each attribute, and the difference widens at a faster rate as the number of histograms increases. With the greedy algorithm, the time penalty is small, but the error reduction is somewhat less as well. We propose a third algorithm, called greedy algorithm with remedy, that has running time similar to the greedy algorithm, but produces results close to global optimum. In fact, in every experiment that we tried, this algorithm found the exact global optimum.