Fast synchronization for chip multiprocessors

Authors:
Jack Sampson;Rubén González;Jean-Francois Collard;Norman P. Jouppi;Mike Schlansker
Affiliations:
UCSD;UPC Barcelona;Hewlett-Packard Laboratories, Palo Alto, California;Hewlett-Packard Laboratories, Palo Alto, California;Hewlett-Packard Laboratories, Palo Alto, California
Venue:
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Year:
2005

Citing 12
Cited 3

Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Fast barrier synchronization hardware

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The network architecture of the Connection Machine CM-5 (extended abstract)

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
System-on-a-chip processor synchronization support in hardware

Proceedings of the conference on Design, automation and test in Europe
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Supporting Fine-Grained Synchronization on a Simultaneous Multithreading Processor

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Multicores from the Compiler's Perspective: A Blessing or a Curse?

Proceedings of the international symposium on Code generation and optimization
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
High-Performance Throughput Computing

IEEE Micro
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro
Packaging the Blue Gene/L supercomputer

IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development

Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
A bridging model for multi-core computing

Journal of Computer and System Sciences
Measurement of the latency parameters of the Multi-BSP model: a multicore benchmarking approach

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). By forcing the invalidation of selected I-cache lines, this mechanism starves threads and thus forces their execution to stop. Threads are let free when all have entered the barrier.We evaluated this mechanism using SMTSim and report much better (and most importantly, more flat) performance than lock-based barriers supported by existing microprocessors.