Optimizing transactions for captured memory

Authors:
Aleksandar Dragojevic;Yang Ni;Ali-Reza Adl-Tabatabai
Affiliations:
EPFL, Lausanne, Switzerland;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA
Venue:
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Year:
2009

Citing 16
Cited 9

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Escape analysis for Java

Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Hoard: a scalable memory allocator for multithreaded applications

ACM SIGPLAN Notices
Software transactional memory for dynamic-sized data structures

Proceedings of the twenty-second annual symposium on Principles of distributed computing
McRT-STM: a high performance software transactional memory system for a multi-core runtime

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
McRT-Malloc: a scalable transactional memory allocator

Proceedings of the 5th international symposium on Memory management
Optimizing memory transactions

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Compiler and runtime support for efficient software transactional memory

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Architectural Support for Software Transactional Memory

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Enforcing isolation and ordering in STM

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Automatic data partitioning in software transactional memories

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Kicking the tires of software transactional memory: why the going gets tough

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Design and implementation of transactional constructs for C/C++

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Language support and compiler optimizations for STM and transactional boosting

ICDCIT'07 Proceedings of the 4th international conference on Distributed computing and internet technology
Transactional locking II

DISC'06 Proceedings of the 20th international conference on Distributed Computing

Transactional mutex locks

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Why STM can be more than a research toy

Communications of the ACM
A shape analysis for optimizing parallel graph programs

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Hybrid binary rewriting for memory access instrumentation

Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Lowering STM overhead with static analysis

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
STM with transparent API considered harmful

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Runtime elision of transactional barriers for captured memory

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Transactionalizing legacy code: an experience report using GCC and Memcached

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Profile-guided transaction coalescing—lowering transactional overheads by merging transactions

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (also called STM barriers). A compiler unaware of that, however, may translate simple memory load/store operations accessing such memory into more expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler can be accesses to captured memory, which include 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. Similar techniques can also be used to elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) showed that upto 18% performance improvement could be achieved at 16 threads.