XPath query processing improvements

  • Authors:
  • P. Mark Pettovello;Farshad Fotouhi

  • Affiliations:
  • Wayne State University, Detroit, Michigan;Wayne State University, Detroit, Michigan

  • Venue:
  • Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Much research has been done adapting relational technology for use with XML and XPath query processing, several research efforts have focused on native XML databases, and some research efforts have focused on hybrid approaches. This paper presents a hybrid design: we extend the usage of path summary indexes by combining them with partitioned indexes on schema-less XML documents to accelerate XPath query processing. Efficient XPath query processing is important because XPath is the query language used for node selection within XQuery. To index an XML document, each node is assigned a path identifier that is unique for every root-to-node path. A separate XML path summary index is created, itself encoded as an XML document, which summarizes the document structure by eliminating path redundancies which are inherent within many XML document instances. The use of structure summaries is widely adopted. Two additional supporting indexes are utilized: first, the XML structure is placed into a structure index that is partitioned by the path identifier, and second, the XML element and attribute values are placed into a separate value index that is partitioned by the same path identifier. Therefore, we integrate structure summaries, complete structure, and values into a unified index. To support comprehensive integration we use unique implementation and query methods. XPath queries, either partially or fully, are first executed against the summary index to derive candidate path identifiers which are placed into a specialized hash map tree cursor. We introduce the partitioned branching path join, a twig join that enables efficient index nested loop joins between various B+-tree partitions on the same structure relation, guided by the hash map tree cursor. We conclude with performance results from several queries using our lightweight prototype system, which demonstrates that our combination of methods matches or outperforms existing high-end database engines when determining node sequences for several XPath queries.