Numerical computation of internal & external flows: fundamentals of numerical discretization
Numerical computation of internal & external flows: fundamentals of numerical discretization
Adjoint sensitivity analysis of regional air quality models
Journal of Computational Physics
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
Implementing Wilson-Dirac operator on the cell broadband engine
Proceedings of the 22nd annual international conference on Supercomputing
Optimizing large scale chemical transport models for multicore platforms
Proceedings of the 2008 Spring simulation multiconference
Development and acceleration of parallel chemical transport models
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Scalable heterogeneous parallelism for atmospheric modeling and simulation
The Journal of Supercomputing
Hi-index | 0.00 |
Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets is paramount. We present a 3D chemical transport module optimized for the Cell Broadband Engine Architecture (CBEA). By leveraging the heterogeneous parallelism of the Cell with a method we call vector stream processing, our transport module achieves performance comparable to two nodes of an IBM BlueGene/P, or eight Xeon cores, on a single Cell chip. Performance of the module on two CBEA systems, an IBM BlueGene/P, and an eight-core shared-memory Intel Xeon workstation are given.