Numerical computation of internal & external flows: fundamentals of numerical discretization
Numerical computation of internal & external flows: fundamentals of numerical discretization
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Adjoint sensitivity analysis of regional air quality models
Journal of Computational Physics
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Concurrency and Computation: Practice & Experience
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Multilevel parallelization on the cell/B.E. for a motion JPEG 2000 encoding server
Proceedings of the 15th international conference on Multimedia
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems
International Journal of High Performance Computing Applications
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
Implementing Wilson-Dirac operator on the cell broadband engine
Proceedings of the 22nd annual international conference on Supercomputing
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Optimized Implementation of Ray Tracing on Cell Broadband Engine
MUE '08 Proceedings of the 2008 International Conference on Multimedia and Ubiquitous Engineering
Vector stream processing for effective application of heterogeneous parallelism
Proceedings of the 2009 ACM symposium on Applied Computing
Hi-index | 0.00 |
Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA). A function offloading approach is used in a 2D transport module, and a vector stream processing approach is used in a 3D transport module. Two methods for transferring incontiguous data between main memory and accelerator local storage are compared. By leveraging the heterogeneous parallelism of the CBEA, the 3D transport module achieves performance comparable to two nodes of an IBM BlueGene/P, or eight Intel Xeon cores, on a single PowerXCell 8i chip. Module performance on two CBEA systems, an IBM BlueGene/P, and an eight-core shared-memory Intel Xeon workstation are given.