Parabix: Boosting the efficiency of text processing on commodity processors

  • Authors:
  • Dan Lin;Nigel Medforth;Kenneth S. Herdy;Arrvindh Shriraman;Rob Cameron

  • Affiliations:
  • School of Computing Science, Simon Fraser University;School of Computing Science, Simon Fraser University;School of Computing Science, Simon Fraser University;School of Computing Science, Simon Fraser University;School of Computing Science, Simon Fraser University

  • Venue:
  • HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern applications employ text files widely for providing data storage in a readable format for applications ranging from database systems to mobile phones. Traditional text processing tools are built around a byte-at-a-time sequential processing model that introduces significant branch and cache miss penalties. Recent work has explored an alternative, transposed representation of text, Parabix (Parallel Bit Streams), to accelerate scanning and parsing using SIMD facilities. This paper advocates and develops Parabix as a general framework and toolkit, describing the software toolchain and run-time support that allows applications to exploit modern SIMD instructions for high performance text processing. The goal is to generalize the techniques to ensure that they apply across a wide variety of applications and architectures. The toolchain enables the application developer to write constructs assuming unbounded character streams and Parabix's code translator generates code based on machine specifics (e.g., SIMD register widths). The general argument in support of Parabix technology is made by a detailed performance and energy study of XML parsing across a range of processor architectures. Parabix exploits intra-core SIMD hardware and demonstrates 2脳 -- 7脳 speedup and 4脳 improvement in energy efficiency when compared with two widely used conventional software parsers, Expat and Apache-Xerces. SIMD implementations across three generations of x86 processors are studied including the new SandyBridge. The 256-bit AVX technology in Intel SandyBridge is compared with the well established 128-bit SSE technology to analyze the benefits and challenges of 3-operand instruction formats and wider SIMD hardware. Finally, the XML program is partitioned into pipeline stages to demonstrate that thread-level parallelism enables the application to exploit SIMD units scattered across the different cores, achieving improved performance (2脳 on 4 cores) while maintaining single-threaded energy levels.