MPI Microtask for programming the cell broadband engineTM processor

Authors:
M. Ohara;H. Inoue;Y. Sohda;H. Komatsu;T. Nakatani
Affiliations:
-;-;-;-;-
Venue:
IBM Systems Journal
Year:
2006

Citing 10
Cited 32

S/390 cluster technology: Parallel Sysplex

IBM Systems Journal
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
Using MPI-2: Advanced Features of the Message Passing Interface

Using MPI-2: Advanced Features of the Message Passing Interface
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
The Stream Virtual Machine

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Mapping unstructured applications into nested parallelism

VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science

CellSs: making it easier to program the cell broadband engine processor

IBM Journal of Research and Development
Accelerating computing with the cell broadband engine processor

Proceedings of the 5th conference on Computing frontiers
A lightweight streaming layer for multicore execution

ACM SIGARCH Computer Architecture News
A Buffered-Mode MPI Implementation for the Cell BETM Processor

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Radioastronomy Image Synthesis on the Cell/B.E.

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A Constraint Programming Approach for Allocation and Scheduling on the CELL Broadband Engine

CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
SPENK: adding another level of parallelism on the cell broadband engine

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Building high-resolution sky images using the Cell/B.E.

Scientific Programming - High Performance Computing with the Cell Broadband Engine
Celling SHIM: compiling deterministic concurrency to a heterogeneous multicore

Proceedings of the 2009 ACM symposium on Applied Computing
A Unified Runtime System for Heterogeneous Multi-core Architectures

Euro-Par 2008 Workshops - Parallel Processing
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Towards a framework for abstracting accelerators in parallel applications: experience with cell

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimization of collective communication in intra-cell MPI

HiPC'07 Proceedings of the 14th international conference on High performance computing
State-of-the-art in heterogeneous computing

Scientific Programming
Software architecture and system validation of an open, unified model for accelerated multicore computing

IBM Journal of Research and Development
The reverse-acceleration model for programming petascale hybrid systems

IBM Journal of Research and Development
MapReduce for the cell broadband engine architecture

IBM Journal of Research and Development
Recursion-driven parallel code generation for multi-core platforms

Proceedings of the Conference on Design, Automation and Test in Europe
Monte Carlo implementation of financial simulation on Cell/B.E. multi-core processor

Mathematics and Computers in Simulation
Hierarchical scheduling of DAG structured computations on manycore processors with dynamic thread grouping

JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Programming heterogeneous clusters with accelerators using object-based programming

Scientific Programming
Parallelization schemes for memory optimization on the cell processor: a case study on the Harris corner detector

Transactions on high-performance embedded architectures and compilers III
Single-port and multi-port collective communication operations on single and dual Cell BE processor systems

International Journal of Communication Networks and Distributed Systems
MPOpt-Cell: a high-performance data-flow programming environment for the CELL BE processor

Proceedings of the 8th ACM International Conference on Computing Frontiers
HOMPI: a hybrid programming framework for expressing and deploying task-based parallelism

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
A comparison of three commodity-level parallel architectures: multi-core CPU, cell BE and GPU

MMCS'08 Proceedings of the 7th international conference on Mathematical Methods for Curves and Surfaces
Multicore acceleration of Discrete Event System Specification systems

Simulation
A synchronous mode MPI implementation on the cell BETM architecture

ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors

ACM Transactions on Architecture and Code Optimization (TACO)
An integrated, programming model-driven framework for NoC-QoS support in cluster-based embedded many-cores

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cell Broadband EngineTM processor employs multiple accelerators, called synergistic processing elements (SPEs), for high performance. Each SPE has a high-speed local store attached to the main memory through direct memory access (DMA), but a drawback of this design is that the local store is not large enough for the entire application code or data. It must be decomposed into pieces small enough to fit into local memory, and they must be replaced through the DMA without losing the performance gain of multiple SPEs. We propose a new programming model, MPI microtask, based on the standard Message Passing Interface (MPI) programming model for distributed-memory parallel machines. In our new model, programmers do not need to manage the local store as long as they partition their application into a collection of small microtasks that fit into the local store. Furthermore, the preprocessor and runtime in our microtask system optimize the execution of microtasks by exploiting explicit communications in the MPI model. We have created a prototype that includes a novel static scheduler for such optimizations. Our initial experiments have shown some encouraging results.