Parsing XML using parallel traversal of streaming trees

  • Authors:
  • Yinfei Pan;Ying Zhang;Kenneth Chiu

  • Affiliations:
  • Department of Computer Science, State University of New York, Binghamton, NY;Department of Computer Science, State University of New York, Binghamton, NY;Department of Computer Science, State University of New York, Binghamton, NY

  • Venue:
  • HiPC'08 Proceedings of the 15th international conference on High performance computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML has been widely adopted across a wide spectrum of applications.Its parsing efficiency, however, remains a concern, and can be a bottleneck.With the current trend towards multicore CPUs, parallelization to improve performanceis increasingly relevant. In many applications, the XML is streamedfrom the network, and thus the complete XML document is never in memory atany single moment in time. Parallel parsing of such a stream can be equated toparallel depth-first traversal of a streaming tree. Existing research on parallel treetraversal has assumed the entire tree was available in-memory, and thus cannotbe directly applied. In this paper we investigate parallel, SAX-style parsing ofXML via a parallel, depth-first traversal of the streaming document. We showgood scalability up to about 6 cores on a Linux platform.