Implementation tradeoffs in the design of flexible transactional memory support

  • Authors:
  • Arrvindh Shriraman;Sandhya Dwarkadas;Michael L. Scott

  • Affiliations:
  • Department of Computer Science, University of Rochester, United States;Department of Computer Science, University of Rochester, United States;Department of Computer Science, University of Rochester, United States

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present FlexTM (FLEXible Transactional Memory), a high performance TM framework that allows software to determine when (eagerly, lazily, or in a mixed fashion) and how to manage conflicts, while employing hardware to manage transactional state and to track conflicts. FlexTM coordinates four decoupled hardware mechanisms: read and write signatures, which summarize per-thread access sets; per-thread conflict summary tables (CSTs), which identify the processors with which conflicts have occurred; Programmable Data Isolation, which buffers speculative updates in the local cache and uses an overflow table to handle unbounded updates; and Alert-On-Update, which notifies a thread immediately when a specified location is written by another processor. The CSTs enable an STM-inspired commit protocol that manages conflicts in a decentralized manner (no global arbitration) and allows parallel commits. We explore the implementation tradeoffs associated with FlexTM's versioning and conflict detection mechanisms. Our results demonstrate that FlexTM exhibits ~5x speedup over high-quality software TMs, and ~1.8x speedup over hybrid TMs (those with software always in the loop), with no loss in policy flexibility. We find that the distributed commit protocol improves performance by 2%-14% over an aggressive centralized arbiter mechanism that also allows parallel commits. Finally, we compare the use of an aggressive hardware controller (as used in the base FlexTM design) to manage and to access any speculative transaction state overflowed from the cache, to a hardware-software approach dubbed FlexTM-S (FlexTM-Streamlined), where software manages the overflow region but uses a metadata cache to accelerate speculative data replacements and their subsequent accesses. We demonstrate that FlexTM-S's performance is within 10% of FlexTM's despite its substantially simpler virtualization mechanism.