Path Summaries and Path Partitioning in Modern XML Databases

Authors:
Andrei Arion;Angela Bonifati;Ioana Manolescu;Andrea Pugliese
Affiliations:
INRIA Futurs---LRI, Orsay, France;ICAR CNR, Palermo, Italy;INRIA Futurs---LRI, Orsay, France and INRIA Futurs, Gemo group, Orsay Cedex, France 91893;University of Calabria, Rende, Italy
Venue:
World Wide Web
Year:
2008

Citing 0
Cited 9

Structured materialized views for XML queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Holistically Stream-based Processing Xtwig Queries

World Wide Web
XMin: Minimizing Tree Pattern Queries with Minimality Guarantee

World Wide Web
Structural consistency: enabling XML keyword search to eliminate spurious results consistently

The VLDB Journal — The International Journal on Very Large Data Bases
LiquidXML: adaptive XML content redistribution

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Key concepts for native XML processing

From active data management to event-based systems and more
A structural approach to indexing triples

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
Indexing dataspaces with partitions

World Wide Web
Optimizing XML queries: Bitmapped materialized views vs. indexes

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

XML path summaries are compact structures representing all the simple parent-child paths of an XML document. Such paths have also been used in many works as a basis for partitioning the document's content in a persistent store, under the form of path indices or path tables. We revisit the notions of path summaries and path-driven storage model in the context of current-day XML databases. This context is characterized by complex queries, typically expressed in an XQuery subset, and by the presence of efficient encoding techniques such as structural node identifiers. We review a path summary's many uses for query optimization, and given them a common basis, namely relevant paths. We discuss summary-based tree pattern minimization and present some efficient summary-based minimization heuristics. We consider relevant path computation and provide a time- and memory-efficient computation algorithm. We combine the principle of path partitioning with the presence of structural identifiers in a simple path-partitioned storage model, which allows for selective data access and efficient query plans. This model improves the efficiency of twig query processing up to two orders of magnitude over the similar tag-partitioned indexing model. We have implemented the path-partitioned storage model and path summaries in the XQueC compressed database prototype [8]. We present an experimental evaluation of a path summary's practical feasibility and of tree pattern matching in a path-partitioned store.