Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems
Histogram-based estimation techniques in database systems
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Optimal and approximate computation of summary statistics for range aggregates
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast incremental maintenance of approximate histograms
ACM Transactions on Database Systems (TODS)
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Estimating Range Queries Using Aggregate Data with Integrity Constraints: A Probabilistic Approach
ICDT '01 Proceedings of the 8th International Conference on Database Theory
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Recovering Information from Summary Data
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
The optimization of queries in relational databases
The optimization of queries in relational databases
Estimating selectivities in data bases
Estimating selectivities in data bases
Probabilistic wavelet synopses
ACM Transactions on Database Systems (TODS)
Selectivity estimators for multidimensional range queries over real attributes
The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet synopses for general error metrics
ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2004
A quad-tree based multiresolution approach for two-dimensional summary data
SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
Compressed histograms with arbitrary bucket layouts for selectivity estimation
Information Sciences: an International Journal
Enhancing histograms by tree-like bucket indices
The VLDB Journal — The International Journal on Very Large Data Bases
Wavelet synopsis for hierarchical range queries with workloads
The VLDB Journal — The International Journal on Very Large Data Bases
Compressed hierarchical binary histograms for summarizing multi-dimensional data
Knowledge and Information Systems
The VLDB Journal — The International Journal on Very Large Data Bases
Enabling OLAP in mobile environments via intelligent data cube compression techniques
Journal of Intelligent Information Systems
Performance evaluation of density-based clustering methods
Information Sciences: an International Journal
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
RFID-data compression for supporting aggregate queries
ACM Transactions on Database Systems (TODS)
Hi-index | 0.07 |
A histogram over a multi-dimensional data set is a synopsis consisting of aggregate data summarizing the values of the points inside non-overlapping ranges of the domain. Owing to their effectiveness in supporting a fast (though approximate) estimation of the answers of aggregate range queries, histograms are widely used in several contexts dealing with multi-dimensional data, especially those where the precision of the answers (within reasonable limits) is not the major requirement. However, the practical impact of histograms has been limited by the fact that, so far, no mechanism has been defined which provides a reliable (non-trivial) quantification of the degree of approximation of the query estimates. In this paper, this problem is addressed by introducing a probabilistic framework which allows for estimating the accuracy of the approximate answers resulting from evaluating aggregate queries over a histogram. Specifically, given a histogram over a data set, the answer of an aggregate range query is modeled as a random variable, whose probability distribution depends on the type and the values of the aggregate data stored in the histogram. Therein, the mean value and the variance of this random variable represent an estimate of the actual answer of the corresponding query and of the error rate, respectively. The proposed framework can exploit different kinds of aggregates (namely, sum and count) stored in the histogram, as well as integrity constraints defined over the original data.