Efficient construction of histograms for multidimensional data using quad-trees

  • Authors:
  • Yohan J. Roh;Jae Ho Kim;Jin Hyun Son;Myoung Ho Kim

  • Affiliations:
  • Data Analytics Group, Samsung Advanced Institute of Technology, Samsung Electronics Nongseo-dong, Yongin Si Giheung-gu, Gyeonggi-Do 446-712, South Korea;Department of Computer Science KAIST 373-1 Guseong-dong, Yuseong-gu, Taejon 305-701, South Korea;Department of Computer Science and Engineering Hanyang University 1271 Sa-1 dong, Ansan, Kyunggi-do 425-791, South Korea;Department of Computer Science KAIST 373-1 Guseong-dong, Yuseong-gu, Taejon 305-701, South Korea

  • Venue:
  • Decision Support Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Histograms can be useful in estimating the selectivity of queries in areas such as database query optimization and data exploration. In this paper, we propose a new histogram method for multidimensional data, called the Q-Histogram, based on the use of the quad-tree, which is a popular index structure for multidimensional data sets. The use of the compact representation of the target data obtainable from the quad-tree allows a fast construction of a histogram with the minimum number of scanning, i.e., only one scanning, of the underlying data. In addition to the advantage of computation time, the proposed method also provides a better performance than other existing methods with respect to the quality of selectivity estimation. We present a new measure of data skew for a histogram bucket, called the weighted bucket skew. Then, we provide an effective technique for skew-tolerant organization of histograms. Finally, we compare the accuracy and efficiency of the proposed method with other existing methods using both real-life data sets and synthetic data sets. The results of experiments show that the proposed method generally provides a better performance than other existing methods in terms of accuracy as well as computational efficiency.