OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Scaling applications to massively parallel machines using Projections performance analysis tool
Future Generation Computer Systems
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
CellSs: a programming model for the cell BE architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
IBM Journal of Research and Development
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of the 24th ACM International Conference on Supercomputing
Cost-aware function migration in heterogeneous systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Programming heterogeneous clusters with accelerators using object-based programming
Scientific Programming
A refactoring tool to extract GPU kernels
Proceedings of the 4th Workshop on Refactoring Tools
Accelerating code on multi-cores with fastflow
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Compiler and runtime support for enabling reduction computations on heterogeneous systems
Concurrency and Computation: Practice & Experience
Vc: A C++ library for explicit vectorization
Software—Practice & Experience
Hi-index | 0.00 |
While accelerators have become more prevalent in recent years, they are still considered hard to program. In this work, we extend a framework for parallel programming so that programmers can easily take advantage of the Cell processor's Synergistic Processing Elements (SPEs) as seamlessly as possible. Using this framework, the same application code can be compiled and executed on multiple platforms, including x86-based and Cell-based clusters. Furthermore, our model allows independently developed libraries to efficiently time-share one or more SPEs by interleaving work from multiple libraries. To demonstrate the framework, we present performance data for an example molecular dynamics (MD) application. When compared to a single Xeon core utilizing streaming SIMD extensions (SSE), the MD program achieves a speedup of 5.74 on a single Cell chip (with 8 SPEs). In comparison, a similar speedup of 5.89 is achieved using six Xeon (x86) cores.