Implementation and performance modeling of deterministic particle transport (Sweep3D) on the IBM Cell/B.E.

Authors:
Olaf Lubeck;Michael Lang;Ram Srinivasan;Greg Johnson
Affiliations:
Los Alamos National Laboratory, Los Alamos, NM, USA;(Corresponding author: Michael Lang, Los Alamos National Laboratory, TA3 Bldg 2011, Los Alamos, NM 87545, USA. Tel.: +1 505 665 5756/ Fax: +1 505 665 4939/ E-mail: mlang@lanl.gov) Los Alamos Natio ...;Intel Fort Collins, CO, USA;Google Mountain View, CA, USA
Venue:
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Year:
2009

Citing 12
Cited 4

MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
Scalability Analysis of Multidimensional Wavefront Algorithms on Large-Scale SMP Clusters

FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
MonteSim: a Monte Carlo performance model for in-order microachitectures

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
Ultra-Fast CPU Performance Prediction: Extending the Monte Carlo Approach

SBAC-PAD '06 Proceedings of the 18th International Symposium on Computer Architecture and High Performance Computing
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers

IBM Journal of Research and Development
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor

International Journal of Parallel Programming
Entering the petaflop era: the architecture and performance of Roadrunner

Proceedings of the 2008 ACM/IEEE conference on Supercomputing

The reverse-acceleration model for programming petascale hybrid systems

IBM Journal of Research and Development
GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

Journal of Computational Physics
Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchies

Parallel Computing
High performance radiation transport simulations: preparing for Titan

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master-worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.