Fast synchronization on shared-memory multiprocessors: An architectural approach

Authors:
Zhen Fang;Lixin Zhang;John B. Carter;Liqun Cheng;Michael Parker
Affiliations:
School of Computing, University of Utah, Salt Lake City, UT 84112, USA;IBM Austin Research Laboratory, 11400 Burnet Road, MS 904/6C019, Austin, TX 78758, USA;School of Computing, University of Utah, Salt Lake City, UT 84112, USA;School of Computing, University of Utah, Salt Lake City, UT 84112, USA;Cray, Inc., 1050 Lowater Road, Chippewa Falls, WI 54729, USA
Venue:
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Year:
2005

Citing 15
Cited 2

Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
I-structures: data structures for parallel computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
A scalable implementation of barrier synchronization using an adaptive combining tree

International Journal of Parallel Programming
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Fast, contention-free combining tree barriers for shared-memory multiprocessors

International Journal of Parallel Programming
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters

IEEE Transactions on Parallel and Distributed Systems
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Coherence controller architectures for SMP-based CC-NUMA multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems

IEEE Transactions on Parallel and Distributed Systems
The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

International Journal of Parallel Programming
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Efficient Barrier Using Remote Memory Operations on VIA-Based Clusters

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation

IEEE Transactions on Computers
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

IEEE Transactions on Computers

Thread-parallel MPEG-2 and MPEG-4 encoders for shared-memory System-on-Chip multiprocessors

International Journal of Computers and Applications
Active memory controller

The Journal of Supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Synchronization is a crucial operation in many parallel applications. Conventional synchronization mechanisms are failing to keep up with the increasing demand for efficient synchronization operations as systems grow larger and network latency increases. The contributions of this paper are threefold. First, we revisit some representative synchronization algorithms in light of recent architecture innovations and provide an example of how the simplifying assumptions made by typical analytical models of synchronization mechanisms can lead to significant performance estimate errors. Second, we present an architectural innovation called active memory that enables very fast atomic operations in a shared-memory multiprocessor. Third, we use execution-driven simulation to quantitatively compare the performance of a variety of synchronization mechanisms based on both existing hardware techniques and active memory operations. To the best of our knowledge, synchronization based on active memory outforms all existing spinlock and non-hardwired barrier implementations by a large margin.