Architectural Support for Fair Reader-Writer Locking

Authors:
Enrique Vallejo;Ramon Beivide;Adrian Cristal;Tim Harris;Fernando Vallejo;Osman Unsal;Mateo Valero
Affiliations:
-;-;-;-;-;-;-
Venue:
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2010

Citing 38
Cited 3

Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Scalable reader-writer synchronization for shared-memory multiprocessors

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
The Stanford Dash Multiprocessor

Computer
Memory system design for bus-based multiprocessors

Memory system design for bus-based multiprocessors
The Stanford FLASH multiprocessor

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Proceedings of the 25th annual international symposium on Computer architecture
System-on-a-chip processor synchronization support in hardware

Proceedings of the conference on Design, automation and test in Europe
Scalable queue-based spin locks with timeout

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Non-blocking timeout in scalable queue-based spin locks

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Simics: A Full System Simulation Platform

Computer
The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms

IEEE Micro
Performance measurements on HEP - a pipelined MIMD computer

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Hierarchical Backoff Locks for Nonuniform Communication Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Concurrent programming without locks

ACM Transactions on Computer Systems (TOCS)
Synchronization state buffer: supporting efficient fine-grain synchronization on many-core architectures

Proceedings of the 34th annual international symposium on Computer architecture
Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language

Proceedings of the International Symposium on Code Generation and Optimization
A Fair Fast Scalable Rea,der-Writer Lock

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
Privatization techniques for software transactional memory

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Dynamic performance tuning of word-based software transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Scalable reader-writer locks

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
TLRW: return of the read-write lock

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Transactional Memory, 2nd Edition

Transactional Memory, 2nd Edition
Preemption adaptivity in time-published queue-based spin locks

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Transactional locking II

DISC'06 Proceedings of the 20th international conference on Distributed Computing

Fast RMWs for TSO: semantics and implementation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
An efficient and flexible hardware support for accelerating synchronization operations on the STHORM many-core architecture

Proceedings of the Conference on Design, Automation and Test in Europe
HARS: A hardware-assisted runtime software for embedded many-core architectures

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many shared-memory parallel systems use lock-based synchronization mechanisms to provide mutual exclusion or reader-writer access to memory locations. Software locks are inefficient either in memory usage, lock transfer time, or both. Proposed hardware locking mechanisms are either too specific (for example, requiring static assignment of threads to cores and vice-versa), support a limited number of concurrent locks, require tag values to be associated with every memory location, rely on the low latencies of single-chip multicore designs or are slow in adversarial cases such as suspended threads in a lock queue. Additionally, few proposals cover reader-writer locks and their associated fairness issues. In this paper we introduce the Lock Control Unit (LCU) which is an acceleration mechanism collocated with each core to explicitly handle fast reader-writer locking. By associating a unique thread-id to each lock request we decouple the hardware lock from the requestor core. This provides correct and efficient execution in the presence of thread migration. By making the LCU logic autonomous from the core, it seamlessly handles thread preemption. Our design offers richer semantics than previous proposals, such as try lock support while providing direct core-to-core transfers. We evaluate our proposal with micro benchmarks, a fine-grain Software Transactional Memory system and programs from the Parsec and Splash parallel benchmark suites. The lock transfer time decreases in up to 30\% when compared to previous hardware proposals. Transactional Memory systems limited by reader-locking congestion boost up to 3x while still preserving graceful fairness and starvation freedom properties. Finally, commonly used applications achieve speedups up to a 7% when compared to software models.