The input/output complexity of sorting and related problems
Communications of the ACM
Sorting, grouping and duplicate elimination in the advanced information management prototype
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Lore: a database management system for semistructured data
ACM SIGMOD Record
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Duplicate Detection and Deletion in the Extended NF² Data Model
FOFO '89 Proceedings of the 3rd International Conference on Foundations of Data Organization and Algorithms
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Evaluating Queries on Structure with eXtended Access Support Relations
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
Efficiently publishing relational data as XML documents
The VLDB Journal — The International Journal on Very Large Data Bases
Mapping-driven XML transformation
Proceedings of the 16th international conference on World Wide Web
XArch: archiving scientific and reference data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sorting hierarchical data in external memory for archiving
Proceedings of the VLDB Endowment
SliceSort: efficient sorting of hierarchical data
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
XML plays an important role in delivering data over theInternet, and the need to store and manipulate XML in itsnative format has become increasingly relevant. This growingneed necessitates work on developing native XML operators,especially for one as fundamental as sort. In this paperwe present NEXSORT, an algorithm that leverages thehierarchical nature of XML to efficiently sort an XML documentin external memory. In a fully sorted XML document,children of every non-leaf element are ordered accordingto a given sorting criterion. Among NEXSORT's uses is incombination with structural merge as the XML version ofsort-merge join, which allows us to merge large XML documentsusing only a single pass once they are sorted.The hierarchical structure of an XML document limitsthe number of possible legal orderings among itselements, which means that sorting XML is fundamentally"easier" than sorting a flat file. We prove thatthe I/O lower bound for sorting XML in external memoryis 驴(max {n, n logm(k/B)}), where is the numberof blocks in the input XML document, m is the numberof main memory blocks available for sorting, B is the numberof elements that can fit in one block, and k is the maximumfan-out of the input document tree. We show thatNEXSORT performs within a constant factor of this theoreticallower bound. In practice we demonstrate, evenwith a naive implementation, NEXSORT significantly outperformsa regular external merge sort of all elements bytheir key paths, unless the XML document is nearly flat,in which case NEXSORT degenerates essentially to externalmerge sort.