Implementation tradeoffs in the design of flexible transactional memory support

Authors:
Arrvindh Shriraman;Sandhya Dwarkadas;Michael L. Scott
Affiliations:
Department of Computer Science, University of Rochester, United States;Department of Computer Science, University of Rochester, United States;Department of Computer Science, University of Rochester, United States
Venue:
Journal of Parallel and Distributed Computing
Year:
2010

Citing 38
Cited 2

Coherency for multiprocessor virtual address caches

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Software transactional memory for dynamic-sized data structures

Proceedings of the twenty-second annual symposium on Principles of distributed computing
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Unbounded Transactional Memory

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Virtualizing Transactional Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Advanced contention management for dynamic software transactional memory

Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
McRT-STM: a high performance software transactional memory system for a multi-core runtime

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Hybrid transactional memory

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Hybrid transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Unbounded page-based transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Tradeoffs in transactional memory virtualization

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Transactional Memory (Synthesis Lectures on Computer Architecture)

Transactional Memory (Synthesis Lectures on Computer Architecture)
Alert-on-update: a communication aid for shared memory multiprocessors

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Concurrent programming without locks

ACM Transactions on Computer Systems (TOCS)
An effective hybrid transactional memory system with strong isolation guarantees

Proceedings of the 34th annual international symposium on Computer architecture
Performance pathologies in hardware transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
An integrated hardware-software approach to flexible transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Subtleties of Transactional Memory Atomicity Semantics

IEEE Computer Architecture Letters
STMBench7: a benchmark for software transactional memory

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
LogTM-SE: Decoupling Hardware Transactional Memory from Caches

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A Scalable, Non-blocking Approach to Transactional Memory

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Implementing Signatures for Transactional Memory

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Split hardware transactions: true nesting of transactions using best-effort hardware transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Using Hardware Memory Protection to Build a High-Performance, Strongly-Atomic Hybrid Transactional Memory

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Flexible Decoupled Transactional Memory Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A comprehensive strategy for contention management in software transactional memory

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Refereeing conflicts in hardware transactional memory

Proceedings of the 23rd international conference on Supercomputing
EazyHTM: eager-lazy hardware transactional memory

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Conflict detection and validation strategies for software transactional memory

DISC'06 Proceedings of the 20th international conference on Distributed Computing
Transactional locking II

DISC'06 Proceedings of the 20th international conference on Distributed Computing
Adaptive software transactional memory

DISC'05 Proceedings of the 19th international conference on Distributed Computing

Understanding bloom filter intersection for lazy address-set disambiguation

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
SoC-TM: integrated HW/SW support for transactional memory programming on embedded MPSoCs

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present FlexTM (FLEXible Transactional Memory), a high performance TM framework that allows software to determine when (eagerly, lazily, or in a mixed fashion) and how to manage conflicts, while employing hardware to manage transactional state and to track conflicts. FlexTM coordinates four decoupled hardware mechanisms: read and write signatures, which summarize per-thread access sets; per-thread conflict summary tables (CSTs), which identify the processors with which conflicts have occurred; Programmable Data Isolation, which buffers speculative updates in the local cache and uses an overflow table to handle unbounded updates; and Alert-On-Update, which notifies a thread immediately when a specified location is written by another processor. The CSTs enable an STM-inspired commit protocol that manages conflicts in a decentralized manner (no global arbitration) and allows parallel commits. We explore the implementation tradeoffs associated with FlexTM's versioning and conflict detection mechanisms. Our results demonstrate that FlexTM exhibits ~5x speedup over high-quality software TMs, and ~1.8x speedup over hybrid TMs (those with software always in the loop), with no loss in policy flexibility. We find that the distributed commit protocol improves performance by 2%-14% over an aggressive centralized arbiter mechanism that also allows parallel commits. Finally, we compare the use of an aggressive hardware controller (as used in the base FlexTM design) to manage and to access any speculative transaction state overflowed from the cache, to a hardware-software approach dubbed FlexTM-S (FlexTM-Streamlined), where software manages the overflow region but uses a metadata cache to accelerate speculative data replacements and their subsequent accesses. We demonstrate that FlexTM-S's performance is within 10% of FlexTM's despite its substantially simpler virtualization mechanism.