Bloom histogram: path selectivity estimation for XML data with updates

  • Authors:
  • Wei Wang;Haifeng Jiang;Hongjun Lu;Jeffrey Xu Yu

  • Affiliations:
  • School of Computer Science and Engineering, University of NSW, Australia and NICTA, Australia;Dept. of Computer Science, Hong Kong Univ. of Sci. & Tech., Hong Kong, China;Dept. of Computer Science, Hong Kong Univ. of Sci. & Tech., Hong Kong, China;Department of System Engineering and Engineering Management, The Chinese Univ. of Hong Kong, Hong Kong, China

  • Venue:
  • VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cost-based XML query optimization calls for accurate estimation of the selectivity of path expressions. Some other interactive and internet applications can also benefit from such estimations. While there are a number of estimation techniques proposed in the literature, almost none of them has any guarantee on the estimation accuracy within a given space limit. In addition, most of them assume that the XML data are more or less static, i.e., with few updates. In this paper, we present a framework for XML path selectivity estimation in a dynamic context. Specifically, we propose a novel data structure, bloom histogram, to approximate XML path frequency distribution within a small space budget and to estimate the path selectivity accurately with the bloom histogram. We obtain the upper bound of its estimation error and discuss the trade-offs between the accuracy and the space limit. To support updates of bloom histograms efficiently when underlying XML data change, a dynamic summary layer is used to keep exact or more detailed XML path information. We demonstrate through our extensive experiments that the new solution can achieve significantly higher accuracy with an even smaller space than the previous methods in both static and dynamic environments.