Vector stream processing for effective application of heterogeneous parallelism

Authors:
John C. Linford;Adrian Sandu
Affiliations:
Virginia Polytechnic Institute and State University, Blacksburg, VA;Virginia Polytechnic Institute and State University, Blacksburg, VA
Venue:
Proceedings of the 2009 ACM symposium on Applied Computing
Year:
2009

Citing 11
Cited 2

Numerical computation of internal & external flows: fundamentals of numerical discretization

Numerical computation of internal & external flows: fundamentals of numerical discretization
Adjoint sensitivity analysis of regional air quality models

Journal of Computational Physics
The potential of the cell processor for scientific computing

Proceedings of the 3rd conference on Computing frontiers
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Dynamic multigrain parallelization on the cell broadband engine

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
On Characterizing Performance of the Cell Broadband Engine Element Interconnect Bus

NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Cell broadband engine architecture and its first implementation: a performance view

IBM Journal of Research and Development
Dma-based prefetching for i/o-intensive workloads on the cell architecture

Proceedings of the 5th conference on Computing frontiers
Implementing Wilson-Dirac operator on the cell broadband engine

Proceedings of the 22nd annual international conference on Supercomputing
Optimizing large scale chemical transport models for multicore platforms

Proceedings of the 2008 Spring simulation multiconference

Development and acceleration of parallel chemical transport models

SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Scalable heterogeneous parallelism for atmospheric modeling and simulation

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets is paramount. We present a 3D chemical transport module optimized for the Cell Broadband Engine Architecture (CBEA). By leveraging the heterogeneous parallelism of the Cell with a method we call vector stream processing, our transport module achieves performance comparable to two nodes of an IBM BlueGene/P, or eight Xeon cores, on a single Cell chip. Performance of the module on two CBEA systems, an IBM BlueGene/P, and an eight-core shared-memory Intel Xeon workstation are given.