Hardware Support for Broadcast and Reduce in MPSoC

Authors:
Yuanxi Peng;Manuel Saldana;Paul Chow
Affiliations:
-;-;-
Venue:
FPL '11 Proceedings of the 2011 21st International Conference on Field Programmable Logic and Applications
Year:
2011

Citing 0
Cited 1

Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

MPI has been used as a parallel programming model for supercomputers and clusters but also in Multiprocessor System-on-Chip. One component of MPI is collective communication and its performance is key for parallel applications to achieve good speedups. Considerable research has been done to optimize such communication by improving the MPI library algorithms. However, these optimizations are focused on the processing nodes (end-points in a network) rather than on the network itself. In this paper, we target a Network-on-Chip (NoC) and modify it to provide hardware support for broadcast and reduce operations for the ArchES-MPI library. This library is a subset implementation of the MPI standard targeting embedded processors and hardware accelerators implemented in FPGAs. The experimental results show that for a system with 24 embedded processors, the broadcast and reduce operations improved up to 11.4-fold and 22-fold, respectively. Higher benefits are expected for larger systems at the expense of a modest increase resource utilization.