Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors

Authors:
Libo Huang;Zhiying Wang;Nong Xiao;Yongwen Wang;Qiang Dou
Affiliations:
National University of Defense Technology, Changsha, Hunan Province, China;National University of Defense Technology, Changsha, Hunan Province, China;National University of Defense Technology, Changsha, Hunan Province, China;National University of Defense Technology, Changsha, Hunan Province, China;National University of Defense Technology, Changsha, Hunan Province, China
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2008

Citing 24
Cited 0

Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
OMPI: optimizing MPI programs using partial evaluation

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A Delay Model and Speculative Architecture for Pipelined Routers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Guaranteeing the quality of services in networks on chip

Networks on chip
A Hardware Acceleration Unit for MPI Queue Processing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A Preliminary Analysis of the MPI Queue Characteristics of Several Applications

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Performance evaluation of adaptive MPI

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
LMPI: MPI for Heterogeneous Embedded Distributed Systems

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
The M5 Simulator: Modeling Networked Systems

IEEE Micro
A Reconfigurable Cluster-on-Chip Architecture with MPI Communication Layer

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
Programming the Intel 80-core network-on-a-chip terascale processor

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
SoC-MPI: A Flexible Message Passing Library for Multiprocessor Systems-on-Chips

RECONFIG '08 Proceedings of the 2008 International Conference on Reconfigurable Computing and FPGAs
Using application communication characteristics to drive dynamic MPI reconfiguration

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
rMPI: message passing on multicore processors with on-chip interconnect

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
The 48-core SCC Processor: the Programmer's View

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hardware Support for Broadcast and Reduce in MPSoC

FPL '11 Proceedings of the 2011 21st International Conference on Field Programmable Logic and Applications
MPI/CTP: a reconfigurable MPI for HPC applications

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
An architecture for reconfigurable iterative MPI applications in dynamic environments

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
High-performance RMA-based broadcast on the intel SCC

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multicore designs have emerged as the dominant organization for future high-performance microprocessors. Communication in such designs is often enabled by Networks-on-Chip (NoCs). A new trend in such architectures is to fit a Message Passing Interface (MPI) programming model on NoCs to achieve optimal parallel application performance. A key issue in designing MPI over NoCs is communication protocol, which has not been explored in previous research. This article advocates a hardware-supported communication mechanism using a protocol-adaptive approach to adjust to varying NoC configurations (e.g., number of buffers) and workload behavior (e.g., number of messages). We propose the ADaptive Communication Mechanism (ADCM), a hybrid protocol that involves behavior similar to buffered communication when sufficient buffer is available in the receiver to that similar to a synchronous protocol when buffers in the receiver are limited. ADCM adapts dynamically by deciding communication protocol on a per-request basis using a local estimate of recent buffer utilization. ADCM attempts to combine both the advantages of buffered and synchronous communication modes to achieve enhanced throughput and performance. Simulations of various workloads show that the proposed communication mechanism can be effectively used in future NoC designs.