Multi-core acceleration of chemical kinetics for simulation and prediction

Authors:
John C. Linford;John Michalakes;Manish Vachharajani;Adrian Sandu
Affiliations:
Virginia Polytechnic Institute and State University, Blacksburg, VA;National Center for Atmospheric Research, Boulder, CO;University of Colorado at Boulder, Boulder, CO;Virginia Polytechnic Institute and State University, Blacksburg, VA
Venue:
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Year:
2009

Citing 10
Cited 4

GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation
Sparse matrix solvers on the GPU: conjugate gradients and multigrid

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Linear algebra operators for GPU implementation of numerical algorithms

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Implementing Wilson-Dirac operator on the cell broadband engine

Proceedings of the 22nd annual international conference on Supercomputing
Scalable parallel programming with CUDA

ACM SIGGRAPH 2008 classes
Entering the petaflop era: the architecture and performance of Roadrunner

Proceedings of the 2008 ACM/IEEE conference on Supercomputing

Evaluation of streaming aggregation on parallel hardware architectures

Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work implements a computationally expensive chemical kinetics kernel from a large-scale community atmospheric model on three multi-core platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis for each platform in double and single precision on coarse and fine grids is presented. Platform-specific design and optimization is discussed in a mechanism-agnostic way, permitting the optimization of many chemical mechanisms. The implementation of a three-stage Rosenbrock solver for SIMD architectures is discussed. When used as a template mechanism in the the Kinetic PreProcessor, the multi-core implementation enables the automatic optimization and porting of many chemical mechanisms on a variety of multi-core platforms. Speedups of 5.5x in single precision and 2.7x in double precision are observed when compared to eight Xeon cores. Compared to the serial implementation, the maximum observed speedup is 41.1x in single precision.