Speculative lock elision: enabling highly concurrent multithreaded execution

Authors:
Ravi Rajwar;James R. Goodman
Affiliations:
University of Wisconsin-Madison Madison, WI;University of Wisconsin-Madison Madison, WI
Venue:
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Year:
2001

Citing 24
Cited 107

ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging

ACM Transactions on Database Systems (TODS)
SPLASH: Stanford parallel applications for shared-memory

ACM SIGARCH Computer Architecture News
Performance issues in non-blocking synchronization on shared-memory multiprocessors

PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
A methodology for implementing highly concurrent data objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Shoring up persistent applications

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
Performance analysis using very large memory on the 64-bit AlphaServer system

Digital Technical Journal
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Concurrency control: methods, performance, and analysis

ACM Computing Surveys (CSUR)
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
On optimistic methods for concurrency control

ACM Transactions on Database Systems (TODS)
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
A study of common pitfalls in simple multi-threaded programs

Proceedings of the thirty-first SIGCSE technical symposium on Computer science education
On the value locality of store instructions

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Concurrent reading and writing

Communications of the ACM
Multiple Reservations and the Oklahoma Update

IEEE Parallel & Distributed Technology: Systems & Technology
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
Dynamic decentralized cache schemes for mimd parallel processors

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture

Handling of packet dependencies: a critical issue for highly parallel network processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Lock reservation: Java locks can mostly do without atomic operations

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Transactional lock-free execution of lock-based programs

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Temporally silent stores

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Enhancing software reliability with speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Toward efficient and robust software speculative parallelization on multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving server software support for simultaneous multithreaded processors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Billion-Transistor Architectures: There and Back Again

Computer
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Speculative Incoherent Cache Protocols

IEEE Micro
Partially ordered epochs for thread-level speculation

Proceedings of the 2nd conference on Computing frontiers
Revocable locks for non-blocking programming

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Virtualizing Transactional Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Design Space Exploration of a Software Speculative Parallelization Scheme

IEEE Transactions on Parallel and Distributed Systems
Characterization of TCC on Chip-Multiprocessors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Store Memory-Level Parallelism Optimizations for Commercial Applications

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
SableSpMT: a software framework for analysing speculative multithreading in Java

PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Scalable Load and Store Processing in Latency-Tolerant Processors

IEEE Micro
Optimizing memory transactions

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Compiler and runtime support for efficient software transactional memory

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Hybrid transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Hardware tansactional memory support for lightweight dynamic language evolution

Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Executing Java programs with transactional memory

Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
Nested transactional memory: model and architecture sketches

Science of Computer Programming - Special issue: Synchronization and concurrency in object-oriented languages
Starvation-free commit arbitration policies for transactional memory systems

ACM SIGARCH Computer Architecture News
Making the fast case common and the uncommon case simple in unbounded transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
MetaTM/TxLinux: transactional memory for an operating system

Proceedings of the 34th annual international symposium on Computer architecture
An integrated hardware-software approach to flexible transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors

Proceedings of the International Symposium on Code Generation and Optimization
Store Atomicity for Transactional Memory

Electronic Notes in Theoretical Computer Science (ENTCS)
Contention resolution with heterogeneous job sizes

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
The mechanics of in-kernel synchronization for a scalable microkernel

ACM SIGOPS Operating Systems Review
TxLinux: using and managing hardware transactional memory in an operating system

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Synchronization coherence: A transparent hardware mechanism for cache coherence and fine-grained synchronization

Journal of Parallel and Distributed Computing
General and efficient locking without blocking

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
IWannaBit!

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Is the optimism in optimistic concurrency warranted?

HOTOS'07 Proceedings of the 11th USENIX workshop on Hot topics in operating systems
A case for low-complexity MP architectures

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment

Proceedings of the 5th conference on Computing frontiers
Adaptive transaction scheduling for transactional memory systems

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
TxLinux and MetaTM: transactional memory and the operating system

Communications of the ACM - Enterprise information integration: and other tools for merging data
Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Rerun: Exploiting Episodes for Lightweight Memory Race Recording

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Contention-aware scheduler: unlocking execution parallelism in multithreaded java programs

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Critical sections: re-emerging scalability concerns for database storage engines

Proceedings of the 4th international workshop on Data management on new hardware
Maximum benefit from a minimal HTM

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Early experience with a commercial hardware transactional memory implementation

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
DracoSTM: a practical C++ approach to software transactional memory

LCSD '07 Proceedings of the 2007 Symposium on Library-Centric Software Design
A runtime system for software lock elision

Proceedings of the 4th ACM European conference on Computer systems
Optimistic concurrency for clusters via speculative locking

SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
ECMon: exposing cache events for monitoring

Proceedings of the 36th annual international symposium on Computer architecture
BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Early experience with a commercial hardware transactional memory implementation

Early experience with a commercial hardware transactional memory implementation
A real system evaluation of hardware atomicity for software speculation

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack

Proceedings of the 5th European conference on Computer systems
Lock elision for read-only critical sections in Java

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Simplifying concurrent algorithms by exploiting hardware transactional memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Contention-sensitive data structures and algorithms

DISC'09 Proceedings of the 23rd international conference on Distributed computing
RETCON: transactional repair without replay

Proceedings of the 37th annual international symposium on Computer architecture
Modeling critical sections in Amdahl's law and its implications for multicore design

Proceedings of the 37th annual international symposium on Computer architecture
Transactional memory

Journal of Parallel and Distributed Computing
Adaptive locks: Combining transactions and locks for efficient concurrency

Journal of Parallel and Distributed Computing
Transactional memory should be an implementation technique, not a programming interface

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
On the effectiveness of speculative and selective memory fences

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Architectural Support for Fair Reader-Writer Locking

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Circuit design of a dual-versioning L1 data cache for optimistic concurrency

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Adding concurrency in python using a commercial processor's hardware transactional memory support

ACM SIGARCH Computer Architecture News
Data-race exceptions have benefits beyond the memory model

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Optimizing hybrid transactional memory: the importance of nonspeculative operations

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Transactional conflict decoupling and value prediction

Proceedings of the international conference on Supercomputing
FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes

Proceedings of the 38th annual international symposium on Computer architecture
Transactional memory today

ICDCIT'10 Proceedings of the 6th international conference on Distributed Computing and Internet Technology
Bottleneck identification and scheduling in multithreaded applications

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Speculative optimizations for parallel programs on multicores

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Circuit design of a dual-versioning L1 data cache

Integration, the VLSI Journal
Static analysis and compiler design for idempotent processing

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Hardware support for enforcing isolation in lock-based parallel programs

Proceedings of the 26th ACM international conference on Supercomputing
A case for including transactions in OpenMP II: hardware transactional memory

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Automated concurrency-bug fixing

OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Starvation-free transactional memory-system protocols

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Pessimistic software lock-elision

DISC'12 Proceedings of the 26th international conference on Distributed Computing
Using hardware transactional memory to correct and simplify and readers-writer lock algorithm

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Programming with hardware lock elision

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Transactional Memory Architecture and Implementation for IBM System Z

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Address-aware fences

Proceedings of the 27th international ACM conference on International conference on supercomputing
Exploring memory consistency for massively-threaded throughput-oriented processors

Proceedings of the 40th Annual International Symposium on Computer Architecture
WeeFence: toward making fences free in TSO

Proceedings of the 40th Annual International Symposium on Computer Architecture
Robust architectural support for transactional memory in the power architecture

Proceedings of the 40th Annual International Symposium on Computer Architecture
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior

Proceedings of the 40th Annual International Symposium on Computer Architecture
Transparent and energy-efficient speculation on NUMA architectures for embedded MPSoCs

Proceedings of the First International Workshop on Many-core Embedded Systems
Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
SI-TM: reducing transactional memory abort rates through snapshot isolation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Eliminating global interpreter locks in ruby through hardware transactional memory

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scaling existing lock-based applications with lock elision

Communications of the ACM
Scaling Existing Lock-based Applications with Lock Elision

Queue - Performance

Quantified Score

Hi-index	0.02

Visualization

Abstract

Serialization of threads due to critical sections is a fundamental bottleneck to achieving high performance in multithreaded programs. Dynamically, such serialization may be unnecessary because these critical sections could have safely executed concurrently without locks. Current processors cannot fully exploit such parallelism because they do not have mechanisms to dynamically detect such false inter-thread dependences.We propose Speculative Lock Elision (SLE), a novel micro-architectural technique to remove dynamically unnecessary lock-induced serialization and enable highly concurrent multithreaded execution. The key insight is that locks do not always have to be acquired for a correct execution. Synchronization instructions are predicted as being unnecessary and elided. This allows multiple threads to concurrently execute critical sections protected by the same lock. Misspeculation due to inter-thread data conflicts is detected using existing cache mechanisms and rollback is used for recovery. Successful speculative elision is validated and committed without acquiring the lock.SLE can be implemented entirely in microarchitecture without instruction set support and without system-level modifications, is transparent to programmers, and requires only trivial additional hardware support. SLE can provide programmers a fast path to writing correct high-performance multithreaded programs.