NEXSORT: Sorting XML in External Memory

  • Authors:
  • Adam Silberstein;Jun Yang

  • Affiliations:
  • -;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML plays an important role in delivering data over theInternet, and the need to store and manipulate XML in itsnative format has become increasingly relevant. This growingneed necessitates work on developing native XML operators,especially for one as fundamental as sort. In this paperwe present NEXSORT, an algorithm that leverages thehierarchical nature of XML to efficiently sort an XML documentin external memory. In a fully sorted XML document,children of every non-leaf element are ordered accordingto a given sorting criterion. Among NEXSORT's uses is incombination with structural merge as the XML version ofsort-merge join, which allows us to merge large XML documentsusing only a single pass once they are sorted.The hierarchical structure of an XML document limitsthe number of possible legal orderings among itselements, which means that sorting XML is fundamentally"easier" than sorting a flat file. We prove thatthe I/O lower bound for sorting XML in external memoryis 驴(max {n, n logm(k/B)}), where is the numberof blocks in the input XML document, m is the numberof main memory blocks available for sorting, B is the numberof elements that can fit in one block, and k is the maximumfan-out of the input document tree. We show thatNEXSORT performs within a constant factor of this theoreticallower bound. In practice we demonstrate, evenwith a naive implementation, NEXSORT significantly outperformsa regular external merge sort of all elements bytheir key paths, unless the XML document is nearly flat,in which case NEXSORT degenerates essentially to externalmerge sort.