Efficiency and scalability of barrier synchronization on NoC based many-core architectures

Authors:
Oreste Villa;Gianluca Palermo;Cristina Silvano
Affiliations:
Pacific Northwest National Laboratory, Richland, WA, USA;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy
Venue:
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Year:
2008

Citing 11
Cited 6

Parallel programming: techniques and applications using networked workstations and parallel computers

Parallel programming: techniques and applications using networked workstations and parallel computers
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
Networks on Chips: A New SoC Paradigm

Computer
Hardware- and Software-Based Collective Communication on the Quadrics Network

NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
Low-Latency Virtual-Channel Routers for On-Chip Networks

Proceedings of the 31st annual international symposium on Computer architecture
Exploiting Barriers to Optimize Power Consumption of CMPs

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The implications of working set analysis on supercomputing memory hierarchy design

Proceedings of the 19th annual international conference on Supercomputing
A survey of research and practices of Network-on-chip

ACM Computing Surveys (CSUR)
A methodology for design of application specific deadlock-free routing algorithms for NoC systems

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient synchronization for embedded on-chip multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Scalability of relaxed consistency models in NoC based multicore architectures

ACM SIGARCH Computer Architecture News
Low-cost and energy-efficient distributed synchronization for embedded multiprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Supporting OpenMP on a multi-cluster embedded MPSoC

Microprocessors & Microsystems
Low-Overhead, high-speed multi-core barrier synchronization

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Towards network-on-chip agreement protocols

Proceedings of the tenth ACM international conference on Embedded software
Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Interconnects based on Networks-on-Chip are an appealing solution to address future microprocessor designs where, very likely, hundreds of cores will be connected on a single chip. A fundamental role in highly parallelized applications running on many-core architectures will be played by barrier primitives used to synchronize the execution of parallel processes. This paper focuses on the analysis of the efficiency and scalability of different barrier implementations in many-core architectures based on NoCs. Several message passing barrier implementations based on four algorithms (all-to-all, master-slave, butterfly and tree) have been implemented and evaluated for a single-chip target architecture composed of a variable number of cores (from 4 to 128) and different network topologies (mesh, torus, ring, clustered-ring and fat-tree). Using a cycle-accurate simulator, we show the scalability of each barrier for every NoC topology, analyzing and comparing theoretical with real behaviors. We observed that some barrier algorithms, when implemented in hardware or software, show a different scaling behavior with respect to those theoretically expected. We evaluate the efficiency of each combination topology-barrier, demonstrating that, in many cases, simple network topologies can be more efficient than complex and highly connected topologies.