Path Summaries and Path Partitioning in Modern XML Databases

  • Authors:
  • Andrei Arion;Angela Bonifati;Ioana Manolescu;Andrea Pugliese

  • Affiliations:
  • INRIA Futurs---LRI, Orsay, France;ICAR CNR, Palermo, Italy;INRIA Futurs---LRI, Orsay, France and INRIA Futurs, Gemo group, Orsay Cedex, France 91893;University of Calabria, Rende, Italy

  • Venue:
  • World Wide Web
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML path summaries are compact structures representing all the simple parent-child paths of an XML document. Such paths have also been used in many works as a basis for partitioning the document's content in a persistent store, under the form of path indices or path tables. We revisit the notions of path summaries and path-driven storage model in the context of current-day XML databases. This context is characterized by complex queries, typically expressed in an XQuery subset, and by the presence of efficient encoding techniques such as structural node identifiers. We review a path summary's many uses for query optimization, and given them a common basis, namely relevant paths. We discuss summary-based tree pattern minimization and present some efficient summary-based minimization heuristics. We consider relevant path computation and provide a time- and memory-efficient computation algorithm. We combine the principle of path partitioning with the presence of structural identifiers in a simple path-partitioned storage model, which allows for selective data access and efficient query plans. This model improves the efficiency of twig query processing up to two orders of magnitude over the similar tag-partitioned indexing model. We have implemented the path-partitioned storage model and path summaries in the XQueC compressed database prototype [8]. We present an experimental evaluation of a path summary's practical feasibility and of tree pattern matching in a path-partitioned store.