Evaluation of a Multithreaded Architecture for Cellular Computing
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
QNoC: QoS architecture and design process for network on chip
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Networks on chip
IEEE Micro
Effective OpenMP Implementation and Translation For Multiprocessor System-On-Chip without Using OS
ASP-DAC '07 Proceedings of the 2007 Asia and South Pacific Design Automation Conference
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Capturing topology-level implications of link synthesis techniques for nanoscale networks-on-chip
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Implementing OpenMP on a high performance embedded multicore MPSoC
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A practical OpenMP compiler for system on chips
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
TLSync: support for multiple fast barriers using on-chip transmission lines
Proceedings of the 38th annual international symposium on Computer architecture
Supporting OpenMP on a multi-cluster embedded MPSoC
Microprocessors & Microsystems
Low-Overhead, high-speed multi-core barrier synchronization
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
Barrier synchronization is a key programming primitive for shared memory embedded MPSoCs. As the core count increases, software implementations cannot provide the needed performance and scalability, thus making hardware acceleration critical. In this paper we describe an interconnect extension implemented with standard cells and with a mainstream industrial toolflow. We show that the area overhead is marginal with respect to the performance improvements of the resulting hardware-accelerated barriers. We integrate our HW barrier into the OpenMP programming model and discuss synchronization efficiency compared with traditional software implementations.