High performance XML parsing using parallel bit stream technology

  • Authors:
  • Robert D. Cameron;Kenneth S. Herdy;Dan Lin

  • Affiliations:
  • Simon Fraser University;Simon Fraser University;Simon Fraser University

  • Venue:
  • CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parabix (parallel bit streams for XML) is an open-source XML parser that employs the SIMD (single-instruction multiple-data) capabilities of modern-day commodity processors to deliver dramatic performance improvements over traditional byte-at-a-time parsing technology. Byte-oriented character data is first transformed to a set of 8 parallel bit streams, each stream comprising one bit per character code unit. Character validation, transcoding and lexical item stream formation are all then carried out in parallel using bitwise logic and shifting operations. Byte-at-a-time scanning loops in the parser are replaced by bit scan loops that can advance by as many as 64 positions with a single instruction. A performance study comparing Parabix with the open-source Expat and Xerces parsers is carried out using the PAPI toolkit. Total CPU cycle counts, level 2 data cache misses and branch mispredictions are measured and compared for each parser. The performance of Parabix is further studied with a breakdown of the cycle counts across the core components of the parser. Prospects for further performance improvements are also outlined, with a particular emphasis on leveraging the intraregister parallelism of SIMD processing to enable intrachip parallelism on multicore architectures.