Parallel 3D multigrid methods on the STI cell BE architecture

  • Authors:
  • Fabian Oboril;Jan-Philipp Weiss;Vincent Heuveline

  • Affiliations:
  • SRG New Frontiers in High Performance Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany and Engineering Mathematics and Computing Lab, Karlsruhe Institute of Technology, Karlsruhe, ...;SRG New Frontiers in High Performance Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany and Engineering Mathematics and Computing Lab, Karlsruhe Institute of Technology, Karlsruhe, ...;RG Numerical Simulation, Optimization, and High Performance Computing and Engineering Mathematics and Computing Lab, Karlsruhe Institute of Technology, Karlsruhe, Germany

  • Venue:
  • Facing the multicore-challenge
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The STI Cell Broadband Engine (BE) is a highly capable heterogeneous multicore processor with large bandwidth and computing power perfectly suited for numerical simulation. However, all performance benefits come at the price of productivity since more responsibility is put to the programmer. In particular, programming with the IBM Cell SDK is hampered by not only taking care of a parallel decomposition of the problem but also of managing all data transfers and organizing all computations in a performance-beneficial manner. While raising complexity of program development, this approach enables efficient utilization of available resources. In the present work we investigate the potential and the performance behavior of Cell's parallel cores for a resource-demanding and bandwidth-bound multigrid solver for a three-dimensional Poisson problem. The chosen multigrid method based on a parallel Gauß-Seidel and Jacobi smoothers combines mathematical optimality with a high degree of inherent parallelism. We investigate dedicated code optimization strategies on the Cell platform and evaluate associated performance benefits by a comprehensive analysis. Our results show that the Cell BE platform can give tremendous benefits for numerical simulation based on well-structured data. However, it is inescapable that isolated, vendor-specific, but performance-optimal programming approaches need to be replaced by portable and generic concepts like OpenCL - maybe at the price of performance loss.