Efficient construction of histograms for multidimensional data using quad-trees

Authors:
Yohan J. Roh;Jae Ho Kim;Jin Hyun Son;Myoung Ho Kim
Affiliations:
Data Analytics Group, Samsung Advanced Institute of Technology, Samsung Electronics Nongseo-dong, Yongin Si Giheung-gu, Gyeonggi-Do 446-712, South Korea;Department of Computer Science KAIST 373-1 Guseong-dong, Yuseong-gu, Taejon 305-701, South Korea;Department of Computer Science and Engineering Hanyang University 1271 Sa-1 dong, Ansan, Kyunggi-do 425-791, South Korea;Department of Computer Science KAIST 373-1 Guseong-dong, Yuseong-gu, Taejon 305-701, South Korea
Venue:
Decision Support Systems
Year:
2011

Citing 32
Cited 0

Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Sequential sampling procedures for query size estimation

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Optimal histograms for limiting worst-case error propagation in the size of join results

ACM Transactions on Database Systems (TODS)
The SEQUOIA 2000 storage benchmark

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Query optimization

ACM Computing Surveys (CSUR)
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Selectivity estimation in spatial databases

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Approximating multi-dimensional aggregate range queries over real attributes

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Selectivity estimation for spatio-temporal queries to moving objects

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Top-k selection queries over relational databases: Mapping strategies and performance evaluation

ACM Transactions on Database Systems (TODS)
Fast incremental maintenance of approximate histograms

ACM Transactions on Database Systems (TODS)
On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Evaluating Top-k Selection Queries

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Oracle in a Nutshell

Oracle in a Nutshell
An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Selectivity estimators for multidimensional range queries over real attributes

The VLDB Journal — The International Journal on Very Large Data Bases
Progressive skyline computation in database systems

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
ISOMER: Consistent Histogram Construction Using Query Feedback

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Robust Cardinality and Cost Estimation for Skyline Operator

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Approximation and streaming algorithms for histogram construction problems

ACM Transactions on Database Systems (TODS)
Spatio-temporal join selectivity

Information Systems
A Note on Linear Time Algorithms for Maximum Error Histograms

IEEE Transactions on Knowledge and Data Engineering
A practical approach for efficiently answering top-k relational queries

Decision Support Systems
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Rk-hist: an r-tree based histogram for multi-dimensional selectivity estimation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Histograms and Wavelets on Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Histograms can be useful in estimating the selectivity of queries in areas such as database query optimization and data exploration. In this paper, we propose a new histogram method for multidimensional data, called the Q-Histogram, based on the use of the quad-tree, which is a popular index structure for multidimensional data sets. The use of the compact representation of the target data obtainable from the quad-tree allows a fast construction of a histogram with the minimum number of scanning, i.e., only one scanning, of the underlying data. In addition to the advantage of computation time, the proposed method also provides a better performance than other existing methods with respect to the quality of selectivity estimation. We present a new measure of data skew for a histogram bucket, called the weighted bucket skew. Then, we provide an effective technique for skew-tolerant organization of histograms. Finally, we compare the accuracy and efficiency of the proposed method with other existing methods using both real-life data sets and synthetic data sets. The results of experiments show that the proposed method generally provides a better performance than other existing methods in terms of accuracy as well as computational efficiency.