RCKMPI - lightweight MPI implementation for intel's single-chip cloud computer (SCC)

Authors:
Isaías A. Comprés Ureña;Michael Riepen;Michael Konow
Affiliations:
Microprocessor and Programming Research Labs, Braunschweig, Germany;Microprocessor and Programming Research Labs, Braunschweig, Germany;Microprocessor and Programming Research Labs, Braunschweig, Germany
Venue:
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Year:
2011

Citing 2
Cited 6

Two algorithms for barrier synchronization

International Journal of Parallel Programming
The 48-core SCC Processor: the Programmer's View

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

Invasive MPI on intel's single-chip cloud computer

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Exploring cross-layer power management for PGAS applications on the SCC platform

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
High-performance RMA-based broadcast on the intel SCC

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Scenario-based design flow for mapping streaming applications onto on-chip many-core systems

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Wait-Free message passing protocol for non-coherent shared memory architectures

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Expandable process networks to efficiently specify and explore task, data, and pipeline parallelism

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Single-chip Cloud Computer (SCC) is an experimental processor created by Intel Labs. It is a distributed memory architecture that provides shared memory possibilities and an on die Message Passing Buffer (MPB). This paper presents an MPI implementation (RCKMPI) that uses an efficient mix of MPB and DDR3 shared memory for low level communication. The on die buffer found in the SCC provides higher band width and lower latency than the available shared memory. In spite of this, message passing can be faster through DDR3, due to protocol overheads related to the small size of the MPB and the necessity to split and reassemble large packages, together with the possibility that the data is not available in the cache. These overheads take over after certain message sizes, requiring run time decisions with regards to which type of buffers to use, in order to achieve higher performance. In the current implementation, the decision is based on remaining bytes to transfer from in transit packets. MPI benchmarks are shown to demonstrate that the use of both types of buffers results in equal or lower transmission times than when communicating through the on die buffer alone.