Transactional prefetching: narrowing the window of contention in hardware transactional memory

Authors:
Anurag Negi;Adrià Armejach;Adrián Cristal;Osman S. Unsal;Per Stenstrom
Affiliations:
Chalmers University of Technology, Gothenburg, Sweden;Barcelona Supercomputing Center, Universitat Politècnica de Catalunya, Barcelona, Spain;Barcelona Supercomputing Center, IIIA - Artificial Intelligence Research Institute, Barcelona, Spain;Barcelona Supercomputing Center, Barcelona, Spain;Chalmers University of Technology, Gothenburg, Sweden
Venue:
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Year:
2012

Citing 23
Cited 1

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Predictor-directed stream buffers

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Transactional Coherence and Consistency: Simplifying Parallel Hardware and Software

IEEE Micro
The M5 Simulator: Modeling Networked Systems

IEEE Micro
Performance pathologies in hardware transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
LogTM-SE: Decoupling Hardware Transactional Memory from Caches

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A Scalable, Non-blocking Approach to Transactional Memory

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Implementing Signatures for Transactional Memory

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Scalable and reliable communication for hardware transactional memory

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
FASTM: A Log-based Hardware Transactional Memory with Fast Abort Recovery

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
EazyHTM: eager-lazy hardware transactional memory

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A Dynamically Adaptable Hardware Transactional Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
ZEBRA: a data-centric, hybrid-policy hardware transactional memory design

Proceedings of the international conference on Supercomputing
Eager Meets Lazy: The Impact of Write-Buffering on Hardware Transactional Memory

ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Using a Reconfigurable L1 Data Cache for Efficient Version Management in Hardware Transactional Memory

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques

Techniques to improve performance in requester-wins hardware transactional memory

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory access latency is the primary performance bottleneck in modern computer systems. Prefetching data before it is needed by a processing core allows substantial performance gains by overlapping significant portions of memory latency with useful work. Prior work has investigated this technique and measured potential benefits in a variety of scenarios. However, its use in speeding up Hardware Transactional Memory (HTM) has remained hitherto unexplored. In several HTM designs transactions invalidate speculatively updated cache lines when they abort. Such cache lines tend to have high locality and are likely to be accessed again when the transaction re-executes. Coarse grained transactions that update several cache lines are particularly susceptible to performance degradation even under moderate contention. However, such transactions show strong locality of reference, especially when contention is high. Prefetching cache lines with high locality can, therefore, improve overall concurrency by speeding up transactions and, thereby, narrowing the window of time in which such transactions persist and can cause contention. Such transactions are important since they are likely to form a common TM use-case. We note that traditional prefetch techniques may not be able to track such lines adequately or issue prefetches quickly enough. This paper investigates the use of prefetching in HTMs, proposing a simple design to identify and request prefetch candidates, and measures performance gains to be had for several representative TM workloads.