Summary Grids: Building Accurate Multidimensional Histograms

Authors:
Pedro Furtado;Henrique Madeira
Affiliations:
-;-
Venue:
DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Year:
1999

Citing 11
Cited 2

Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Optimal histograms for limiting worst-case error propagation in the size of join results

ACM Transactions on Database Systems (TODS)
Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Histogram-based estimation techniques in database systems

Histogram-based estimation techniques in database systems
Universality of Serial Histograms

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Deriving predicate statistics in datalog

Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data summarization is very important for many data analysis tasks. In this paper we propose a simple but efficient data summarization algorithm, which outputs a histogram for multidimensional data, and make a comparative study of its usage with different distributions and with existing algorithms. The idea is to iteratively grow and modify regions of homogeneous data. This is a different strategy from a commonly used strategy of iteratively fracturing subspaces using straight lines. This work compares both strategies and concludes that the new technique is better and helds good results. We also concluded that discriminate handling of outliers is important to provide good approximates.