A Parallel Approach to XML Parsing

  • Authors:
  • Wei Lu;Kenneth Chiu;Yinfei Pan

  • Affiliations:
  • Computer Science Department, Indiana University, 150 S. Woodlawn Ave. Bloomington, IN 47405, US. welu@cs.indiana.edu;Department of Computer Science, State University of New York -Binghamton, P.O. Box 6000, Binghamton, NY 13902, US. kchiu@cs.binghamton.edu;Department of Computer Science, State University of New York -Binghamton, P.O. Box 6000, Binghamton, NY 13902, US. ypan3@cs.binghamton.edu

  • Venue:
  • GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has a reputation for poor performance, and a number of optimizations have been developed to address this performance problem from different perspectives, none of which have been entirely satisfactory. In this paper, we present a seemingly quixotic, but novel approach: parallel XML parsing. Parallel XML parsing leverages the growing prevalence of multicore architectures in all sectors of the computer market, and yields significant performance improvements. This paper presents our design and implementation of parallel XML parsing. Our design consists of an initial preparsing phase to determine the structure of the XML document, followed by a full, parallel parse. The results of the preparsing phase are used to help partition the XML document for data parallel processing. Our parallel parsing phase is a modification of the libxml2 [1] XML parser, which shows that our approach applies to real-world, production quality parsers. Our empirical study shows our parallel XML parsing algorithm can improved the XML parsing performance significantly and scales well.