Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

  • Authors:
  • José M. Cecilia;José L. Abellán;Juan Fernández;Manuel E. Acacio;José M. García;Manuel Ujaldón

  • Affiliations:
  • Dept. of Computer Science, Catholic University of Murcia, Murcia, Spain;Dept. of Computer Engineering, University of Murcia, Murcia, Spain;Intel Barcelona Research Center, Intel Labs, Universitat Politècnica de Catalunya, Barcelona, Spain;Dept. of Computer Engineering, University of Murcia, Murcia, Spain;Dept. of Computer Engineering, University of Murcia, Murcia, Spain;Computer Architecture Department, University of Malaga, Malaga, Spain

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We are witnessing the consolidation of the heterogeneous computing in parallel computing with architectures such as Cell Broadband Engine (Cell BE) or Graphics Processing Units (GPUs) which are present in a myriad of developments for high performance computing. These platforms provide a Software Development Kit (SDK) to maximize performance at the expense of dealing with complex and low-level architectural details which makes the software development a daunting task. This paper explores stencil computations in several heterogeneous programming models like Cell SDK, CellSs, ALF and CUDA to optimize the Jacobi method for solving Laplace's differential equation. We describe the programming techniques to extract the maximum performance on the Cell BE and the GPU, and compare their computing paradigms. Experimental results are shown on two Nvidia Teslas and one IBM BladeCenter QS20 blade which incorporates two 3.2 GHz Cell BEs v 5.1. The speed-up factor for our set of GPU optimizations reaches 3---4脳, and the execution times defeat those of the Cell BE by an order of magnitude, also showing great scalability when moving towards newer GPU generations and/or more demanding problem sizes.