Numerical computation of internal & external flows: fundamentals of numerical discretization
Numerical computation of internal & external flows: fundamentals of numerical discretization
Adjoint sensitivity analysis of regional air quality models
Journal of Computational Physics
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Concurrency and Computation: Practice & Experience
Software-Pipelining on Multi-Core Architectures
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Vector stream processing for effective application of heterogeneous parallelism
Proceedings of the 2009 ACM symposium on Applied Computing
Development and acceleration of parallel chemical transport models
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Hi-index | 0.00 |
The performance of a typical chemical transport model is determined on two multicore processors: the heterogeneous Cell Broadband Engine and the homogeneous Intel Quad-Core Xeon shared-memory multiprocessor. Two problem decomposition techniques are discussed: dimension splitting for promoting parallelization in chemical transport models, and time splitting, for reducing truncation error. Additionally, a scalable method for accessing random rows or columns of a matrix of arbitrary size from the accelerator units of the Cell Broadband Engine is presented. This scalable access method increases chemical transport model efficiency by an average of 30% and significantly improves the scalability of dimension-splitting techniques on the Cell Broadband Engine. Experiments show that chemical transport models are 31% more efficient on the Cell Broadband Engine when only six accelerator units are used than on a shared-memory multiprocessor with eight executing cores. Our fully-optimized models achieve an average 118% speedup on the Cell Broadband Engine, and an average 87.5% speedup on a shared-memory multiprocessor with OpenMP.