Parallel programming: techniques and applications using networked workstations and parallel computers
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Hardware- and Software-Based Collective Communication on the Quadrics Network
NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
Low-Latency Virtual-Channel Routers for On-Chip Networks
Proceedings of the 31st annual international symposium on Computer architecture
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The implications of working set analysis on supercomputing memory hierarchy design
Proceedings of the 19th annual international conference on Supercomputing
A survey of research and practices of Network-on-chip
ACM Computing Surveys (CSUR)
A methodology for design of application specific deadlock-free routing algorithms for NoC systems
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient synchronization for embedded on-chip multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Scalability of relaxed consistency models in NoC based multicore architectures
ACM SIGARCH Computer Architecture News
Low-cost and energy-efficient distributed synchronization for embedded multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Supporting OpenMP on a multi-cluster embedded MPSoC
Microprocessors & Microsystems
Low-Overhead, high-speed multi-core barrier synchronization
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Towards network-on-chip agreement protocols
Proceedings of the tenth ACM international conference on Embedded software
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Interconnects based on Networks-on-Chip are an appealing solution to address future microprocessor designs where, very likely, hundreds of cores will be connected on a single chip. A fundamental role in highly parallelized applications running on many-core architectures will be played by barrier primitives used to synchronize the execution of parallel processes. This paper focuses on the analysis of the efficiency and scalability of different barrier implementations in many-core architectures based on NoCs. Several message passing barrier implementations based on four algorithms (all-to-all, master-slave, butterfly and tree) have been implemented and evaluated for a single-chip target architecture composed of a variable number of cores (from 4 to 128) and different network topologies (mesh, torus, ring, clustered-ring and fat-tree). Using a cycle-accurate simulator, we show the scalability of each barrier for every NoC topology, analyzing and comparing theoretical with real behaviors. We observed that some barrier algorithms, when implemented in hardware or software, show a different scaling behavior with respect to those theoretically expected. We evaluate the efficiency of each combination topology-barrier, demonstrating that, in many cases, simple network topologies can be more efficient than complex and highly connected topologies.