Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs

Authors:
José L. Abellán;Juan Fernández;Manuel E. Acacio;Davide Bertozzi;Daniele Bortolotti;Andrea Marongiu;Luca Benini
Affiliations:
DiTEC, University of Murcia, Murcia, Spain;DiTEC, University of Murcia, Murcia, Spain;DiTEC, University of Murcia, Murcia, Spain;ENDIF, University of Ferrara, Ferrara, Italy;DEIS, University of Bologna, Bologna, Italy;DEIS, University of Bologna, Bologna, Italy;DEIS, University of Bologna, Bologna, Italy
Venue:
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2012

Citing 13
Cited 0

Evaluation of a Multithreaded Architecture for Cellular Computing

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
QNoC: QoS architecture and design process for network on chip

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
Bringing NoCs to 65 nm

IEEE Micro
Effective OpenMP Implementation and Translation For Multiprocessor System-On-Chip without Using OS

ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
Efficiency and scalability of barrier synchronization on NoC based many-core architectures

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Capturing topology-level implications of link synthesis techniques for nanoscale networks-on-chip

Proceedings of the 19th ACM Great Lakes symposium on VLSI
Implementing OpenMP on a high performance embedded multicore MPSoC

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A practical OpenMP compiler for system on chips

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Efficient synchronization for embedded on-chip multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
TLSync: support for multiple fast barriers using on-chip transmission lines

Proceedings of the 38th annual international symposium on Computer architecture
Supporting OpenMP on a multi-cluster embedded MPSoC

Microprocessors & Microsystems
Low-Overhead, high-speed multi-core barrier synchronization

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Barrier synchronization is a key programming primitive for shared memory embedded MPSoCs. As the core count increases, software implementations cannot provide the needed performance and scalability, thus making hardware acceleration critical. In this paper we describe an interconnect extension implemented with standard cells and with a mainstream industrial toolflow. We show that the area overhead is marginal with respect to the performance improvements of the resulting hardware-accelerated barriers. We integrate our HW barrier into the OpenMP programming model and discuss synchronization efficiency compared with traditional software implementations.