Hierarchical Backoff Locks for Nonuniform Communication Architectures

Authors:
Zoran Radovic;Erik Hagersten
Affiliations:
-;-
Venue:
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Year:
2003

Citing 23
Cited 11

Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Synchronization Algorithms for Shared-Memory Multiprocessors

Computer
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor

Computer
Parallel Visualization Algorithms: Performance and Architectural Implications

Computer
Reactive synchronization algorithms for multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Performance experiences on Sun's Wildfire prototype

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scalable queue-based spin locks with timeout

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Non-blocking timeout in scalable queue-based spin locks

Proceedings of the twenty-first annual symposium on Principles of distributed computing
The sun fireplane system interconnect

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Queue Locks on Cache Coherent Multiprocessors

Proceedings of the 8th International Symposium on Parallel Processing
Efficient synchronization for nonuniform communication architectures

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
lmbench: portable tools for performance analysis

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference

SNZI: scalable NonZero indicators

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
An introduction to Balder: an OpenMP run-time library for clusters of SMPs

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Architectural Support for Fair Reader-Writer Locking

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Flat-combining NUMA locks

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A hierarchical CLH queue lock

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Lock cohorting: a general technique for designing NUMA locks

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Runtime efficient event scheduling in multi-threaded network simulation

Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
NUMA-aware reader-writer locks

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable statistics counters

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable statistics counters

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper identifies node affinity as an important property for scalable general-purpose locks. Nonuniform communication architectures (NUCAs), for example CC-NUMAs built from a few large nodes or from chip multiprocessors (CMPs), have a lower penalty for reading data from a neighbor's cache than from a remote cache. Lock implementations that encourages handing over locks to neighbors will improve the lock handover time, as well as the accessto the critical data guarded by the lock, but will also be vulnerable to starvation.We propose a set of simple software-based hierarchical backoff locks (HBO) that create node affinity in NUCAs. A solution for lowering the risk of starvation is also suggested. The HBO locks are compared with other software-based lock implementations using simple benchmarks, and are shown to be very competitive for uncontested locks while being more than twice as fast for contended locks. An application study also demonstrates superior performance for applications with high lock contention and competitive performance for other programs.