Selectivity Estimation for XML Twigs

  • Authors:
  • Neoklis Polyzotis;Minos Garofalakis;Yannis Ioannidis

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Twig queries represent the building blocks of declarativequery languages over XML data. A twig query describesa complex traversal of the document graph and generatesa set of element tuples based on the intertwined evaluation(i.e., join) of multiple path expressions. Estimatingthe result cardinality of twig queries or, equivalently, thenumber of tuples in such a structural (path-based) join, isa fundamental problem that arises in the optimization ofdeclarative queries over XML. It is crucial, therefore, to developconcise synopsis structures that summarize the documentgraph and enable such selectivity estimates within thetime and space constraints of the optimizer. In this paper,we propose novel summarization and estimation techniquesfor estimating the selectivity of twig queries with complexXPath expressions over tree-structured data. Our approachis based on the XSKETCH model, augmented with new typesof distribution information for capturing complex correlationpatterns across structural joins. Briefly, the key ideais to represent joins as points in a multidimensional spaceof path counts that capture aggregate information on thecontents of the resulting element tuples. We develop a systematicframework that combines distribution informationwith appropriate statistical assumptions in order to provideselectivity estimates for twig queries over concise XS-KETCHsynopses and we describe an efficient algorithm forconstructing an accurate summary for a given space budget.Implementation results with both synthetic and real-lifedata sets verify the effectiveness of our approach anddemonstrate its benefits over earlier techniques.