Towards high-performance implementations of a custom HPC kernel using ® array building blocks

  • Authors:
  • Alexander Heinecke;Michael Klemm;Hans Pabst;Dirk Pflüger

  • Affiliations:
  • Technische Universität München, Garching, Germany;Intel GmbH, Feldkirchen, Germany;Intel GmbH, Feldkirchen, Germany;Technische Universität München, Garching, Germany

  • Venue:
  • Facing the Multicore-Challenge II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today's highly parallel machines drive a new demand for parallel programming. Fixed power envelopes, increasing problem sizes, and new algorithms pose challenging targets for developers. HPC applications must leverage SIMD units, multi-core architectures, and heterogeneous computing platforms for optimal performance. This leads to low-level, non-portable code that is difficult to write and maintain. With Intel® Array Building Blocks (Intel ArBB), programmers focus on the high-level algorithms and rely on an automatic parallelization and vectorization with strong safety guarantees. Intel ArBB hides vendorspecific hardware knowledge by runtime just-in-time (JIT) compilation. This case study on data mining with adaptive sparse grids unveils how deterministic parallelism, safety, and runtime optimization make Intel ArBB practically applicable. Hand-tuned code is about 40% faster than ArBB, but needs about 8x more code. ArBB clearly outperforms standard semi-automatically parallelized C/C++ code by approximately 6x.