Clustering-based histograms for multi-dimensional data

Authors:
Filippo Furfaro;Giuseppe M. Mazzeo;Cristina Sirangelo
Affiliations:
DEIS, University of Calabria, Rende, Italy;DEIS, University of Calabria, Rende, Italy;DEIS, University of Calabria, Rende, Italy
Venue:
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Year:
2005

Citing 9
Cited 1

Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Selectivity estimation in spatial databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Wavelet synopses with error guarantees

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Selectivity Estimation of Complex Spatial Queries

SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
Range Selectivity Estimation for Continuous Attributes

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management
Selectivity estimators for multidimensional range queries over real attributes

The VLDB Journal — The International Journal on Very Large Data Bases

Exploiting cluster analysis for constructing multi-dimensional histograms on both static and evolving data

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new technique for constructing multi-dimensional histograms is proposed. This technique first invokes a density-based clustering algorithm to locate dense and sparse regions of the input data. Then the data distribution inside each of these regions is summarized by partitioning it into non-overlapping blocks laid onto a grid. The granularity of this grid is chosen depending on the underlying data distribution: the more homogeneous the data, the coarser the grid. Our approach is compared with state-of-the-art histograms on both synthetic and real-life data and is shown to be more effective.