Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE

Authors:
José M. Cecilia;José L. Abellán;Juan Fernández;Manuel E. Acacio;José M. García;Manuel Ujaldón
Affiliations:
Dept. of Computer Science, Catholic University of Murcia, Murcia, Spain;Dept. of Computer Engineering, University of Murcia, Murcia, Spain;Intel Barcelona Research Center, Intel Labs, Universitat Politècnica de Catalunya, Barcelona, Spain;Dept. of Computer Engineering, University of Murcia, Murcia, Spain;Dept. of Computer Engineering, University of Murcia, Murcia, Spain;Computer Architecture Department, University of Malaga, Malaga, Spain
Venue:
The Journal of Supercomputing
Year:
2012

Citing 13
Cited 1

The art of parallel programming

The art of parallel programming
Applied numerical linear algebra

Applied numerical linear algebra
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Characterizing the Basic Synchronization and Communication Operations in Dual Cell-Based Blades

ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems

Proceedings of the 23rd international conference on Supercomputing
Parallel data-locality aware stencil computations on modern micro-architectures

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

IEEE Design & Test
Mint: realizing CUDA performance in 3D stencil methods with annotated C

Proceedings of the international conference on Supercomputing
Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing stencil application on multi-thread GPU architecture using stream programming model

ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems

A GPU implementation of a structural-similarity-based aerial-image classification

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are witnessing the consolidation of the heterogeneous computing in parallel computing with architectures such as Cell Broadband Engine (Cell BE) or Graphics Processing Units (GPUs) which are present in a myriad of developments for high performance computing. These platforms provide a Software Development Kit (SDK) to maximize performance at the expense of dealing with complex and low-level architectural details which makes the software development a daunting task. This paper explores stencil computations in several heterogeneous programming models like Cell SDK, CellSs, ALF and CUDA to optimize the Jacobi method for solving Laplace's differential equation. We describe the programming techniques to extract the maximum performance on the Cell BE and the GPU, and compare their computing paradigms. Experimental results are shown on two Nvidia Teslas and one IBM BladeCenter QS20 blade which incorporates two 3.2 GHz Cell BEs v 5.1. The speed-up factor for our set of GPU optimizations reaches 3---4脳, and the execution times defeat those of the Cell BE by an order of magnitude, also showing great scalability when moving towards newer GPU generations and/or more demanding problem sizes.