STHist-C: a highly accurate cluster-based histogram for two and three dimensional geographic data points

  • Authors:
  • Hai Thanh Mai;Jaeho Kim;Yohan J. Roh;Myoung Ho Kim

  • Affiliations:
  • Department of Computer Science, KAIST, Yuseong-Gu, South Korea 305-701;Department of Computer Science, KAIST, Yuseong-Gu, South Korea 305-701;Samsung Advanced Institute of Technology, Samsung Electronics, Yongin Si, South Korea 446-712;Department of Computer Science, KAIST, Yuseong-Gu, South Korea 305-701

  • Venue:
  • Geoinformatica
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Histograms have been widely used for estimating selectivity in query optimization. In this paper, we propose a new histogram construction method for geographic data objects that are used in many real-world applications. The proposed method is based on analyses and utilization of clusters of objects that exist in a given data set, to build histograms with significantly enhanced accuracy. Our philosophy in allocating the histogram buckets is to allocate them to the subspaces that properly capture object clusters. Therefore, we first propose a procedure to find the centers of object clusters. Then, we propose an algorithm to construct the histogram buckets from these centers. The buckets are initialized from the clusters' centers, then expanded to cover the clusters. Best expansion plans are chosen based on a notion of skewness gain. Results from extensive experiments using real-life data sets demonstrate that the proposed method can really improve the accuracy of the histograms further, when compared with the current state of the art histogram construction method for geographic data objects.