Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Fast barrier synchronization hardware
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The network architecture of the Connection Machine CM-5 (extended abstract)
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
System-on-a-chip processor synchronization support in hardware
Proceedings of the conference on Design, automation and test in Europe
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Multicores from the Compiler's Perspective: A Blessing or a Curse?
Proceedings of the international symposium on Code generation and optimization
High-Performance Throughput Computing
IEEE Micro
Packaging the Blue Gene/L supercomputer
IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
A bridging model for multi-core computing
Journal of Computer and System Sciences
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach
The Journal of Supercomputing
Hi-index | 0.00 |
This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). By forcing the invalidation of selected I-cache lines, this mechanism starves threads and thus forces their execution to stop. Threads are let free when all have entered the barrier.We evaluated this mechanism using SMTSim and report much better (and most importantly, more flat) performance than lock-based barriers supported by existing microprocessors.