Efficiently supporting order in XML query processing

  • Authors:
  • Maged El-Sayed;Katica Dimitrova;Elke A. Rundensteiner

  • Affiliations:
  • Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA;Microsoft Corporation, One Microsoft Way, Redmond, WA and Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA;Department of Computer Science, Worcester Polytechnic Institute, Worcester, MA

  • Venue:
  • Data & Knowledge Engineering - Special issue: WIDM 2003
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML is an ordered data model and XQuery expressions return results that have a well-defined order. However, little work on how order is supported in XML query processing has been done to date. In this paper we study the issues related to handling order in the XML context, namely challenges imposed by the XML data model, the variety of order requirements of the XQuery language, and the need to maintain order in the presence of updates to the XML data. We propose an efficient solution that addresses all these issues. Our solution is based on a key encoding for XML nodes that serves as node identity and at the same time encodes order. We design rules for encoding order of processed XML nodes based on the XML algebraic query execution model and the node key encoding. These rules do not require any actual sorting for intermediate results during execution. Our approach enables efficient order-sensitive incremental view maintenance as it makes most XML algebra operators distributive with respect to bag union. We prove the correctness of our order encoding approach. Our approach is implemented and integrated with Rainbow, an XML data management system developed at WPI. We have tested the efficiency of our approach using queries that have different order requirements. We have also measured the relative cost of different components related to our order solution in different types of queries. In general the overhead of maintaining order in our approach is very small relative to the query processing time.