S/390 cluster technology: Parallel Sysplex
IBM Systems Journal
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Using MPI-2: Advanced Features of the Message Passing Interface
Using MPI-2: Advanced Features of the Message Passing Interface
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Mapping unstructured applications into nested parallelism
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Accelerating computing with the cell broadband engine processor
Proceedings of the 5th conference on Computing frontiers
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
A Buffered-Mode MPI Implementation for the Cell BETM Processor
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Radioastronomy Image Synthesis on the Cell/B.E.
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A Constraint Programming Approach for Allocation and Scheduling on the CELL Broadband Engine
CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
SPENK: adding another level of parallelism on the cell broadband engine
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Building high-resolution sky images using the Cell/B.E.
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Celling SHIM: compiling deterministic concurrency to a heterogeneous multicore
Proceedings of the 2009 ACM symposium on Applied Computing
A Unified Runtime System for Heterogeneous Multi-core Architectures
Euro-Par 2008 Workshops - Parallel Processing
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Towards a framework for abstracting accelerators in parallel applications: experience with cell
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimization of collective communication in intra-cell MPI
HiPC'07 Proceedings of the 14th international conference on High performance computing
State-of-the-art in heterogeneous computing
Scientific Programming
IBM Journal of Research and Development
The reverse-acceleration model for programming petascale hybrid systems
IBM Journal of Research and Development
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Recursion-driven parallel code generation for multi-core platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Monte Carlo implementation of financial simulation on Cell/B.E. multi-core processor
Mathematics and Computers in Simulation
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Programming heterogeneous clusters with accelerators using object-based programming
Scientific Programming
Transactions on high-performance embedded architectures and compilers III
International Journal of Communication Networks and Distributed Systems
MPOpt-Cell: a high-performance data-flow programming environment for the CELL BE processor
Proceedings of the 8th ACM International Conference on Computing Frontiers
HOMPI: a hybrid programming framework for expressing and deploying task-based parallelism
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
A comparison of three commodity-level parallel architectures: multi-core CPU, cell BE and GPU
MMCS'08 Proceedings of the 7th international conference on Mathematical Methods for Curves and Surfaces
A synchronous mode MPI implementation on the cell BETM architecture
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
The Cell Broadband EngineTM processor employs multiple accelerators, called synergistic processing elements (SPEs), for high performance. Each SPE has a high-speed local store attached to the main memory through direct memory access (DMA), but a drawback of this design is that the local store is not large enough for the entire application code or data. It must be decomposed into pieces small enough to fit into local memory, and they must be replaced through the DMA without losing the performance gain of multiple SPEs. We propose a new programming model, MPI microtask, based on the standard Message Passing Interface (MPI) programming model for distributed-memory parallel machines. In our new model, programmers do not need to manage the local store as long as they partition their application into a collection of small microtasks that fit into the local store. Furthermore, the preprocessor and runtime in our microtask system optimize the execution of microtasks by exploiting explicit communications in the MPI model. We have created a prototype that includes a novel static scheduler for such optimizations. Our initial experiments have shown some encouraging results.