Wait-n-GoTM: improving HTM performance by serializing cyclic dependencies

Authors:
Syed Ali Raza Jafri;Gwendolyn Voskuilen;T. N. Vijaykumar
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA
Venue:
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Year:
2013

Citing 26
Cited 1

Software transactional memory

Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Simics: A Full System Simulation Platform

Computer
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Unbounded Transactional Memory

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Virtualizing Transactional Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Compiler and runtime support for efficient software transactional memory

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Hybrid transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Unbounded page-based transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Making the fast case common and the uncommon case simple in unbounded transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
An effective hybrid transactional memory system with strong isolation guarantees

Proceedings of the 34th annual international symposium on Computer architecture
Performance pathologies in hardware transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
LogTM-SE: Decoupling Hardware Transactional Memory from Caches

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Flexible Decoupled Transactional Memory Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Maximum benefit from a minimal HTM

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Early experience with a commercial hardware transactional memory implementation

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Dependence-aware transactional memory for increased concurrency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proactive transaction scheduling for contention management

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Timetraveler: exploiting acyclic races for optimizing memory race recording

Proceedings of the 37th annual international symposium on Computer architecture
RETCON: transactional repair without replay

Proceedings of the 37th annual international symposium on Computer architecture
ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Evaluation of Blue Gene/Q hardware support for transactional memories

Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Techniques to improve performance in requester-wins hardware transactional memory

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transactional memory (TM) has been proposed to alleviate some key programmability problems in chip multiprocessors. Most TMs optimistically allow concurrent transactions, detecting read-write or write-write conflicts. Upon conflicts, existing hardware TMs (HTMs) use one of three conflict-resolution policies: (1) always-abort, (2) always-wait for some conflicting transactions to complete, or (3) always-go past conflicts and resolve acyclic conflicts at commit or abort upon cyclic dependencies. While each policy has advantages, the policies degrade performance under contention by limiting concurrency (always-abort, always-wait) or incurring late aborts due to cyclic dependencies (always-go). Thus, while always-go avoids acyclic aborts, no policy avoids cyclic aborts. We propose Wait-n-GoTM (WnGTM) to increase concurrency while avoiding cyclic aborts. We observe that most cyclic dependencies are caused by threads interleaving multiple accesses to a few heavily-read-write-shared delinquent data cache blocks. These accesses occur in code sections called cycle inducer sections (CISTs). Accordingly, we propose Wait-n-Go (WnG) conflict-resolution to avoid many cyclic aborts by predicting and serializing the CISTs. To support the WnG policy, we extend previous HTMs to (1) allow multiple readers and writers, (2) scalably identify dependencies, and (3) detect cyclic dependencies via new mechanisms, namely, conflict transactional state, order-capture, and hardware timestamps, respectively. In 16-core simulations of STAMP, WnGTM achieves average speedups of 46% for higher-contention benchmarks and 28% for all benchmarks over always-abort (TokenTM) with low-contention benchmarks remaining unchanged, compared to always-go (DATM) and always-wait (LogTM-SE), which perform worse than and 6% better than TokenTM, respectively.