Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Proceedings of the 14th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Hoard: a scalable memory allocator for multithreaded applications
ACM SIGPLAN Notices
Software transactional memory for dynamic-sized data structures
Proceedings of the twenty-second annual symposium on Principles of distributed computing
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
McRT-Malloc: a scalable transactional memory allocator
Proceedings of the 5th international symposium on Memory management
Optimizing memory transactions
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Compiler and runtime support for efficient software transactional memory
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Architectural Support for Software Transactional Memory
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Enforcing isolation and ordering in STM
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Automatic data partitioning in software transactional memories
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Kicking the tires of software transactional memory: why the going gets tough
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Design and implementation of transactional constructs for C/C++
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Language support and compiler optimizations for STM and transactional boosting
ICDCIT'07 Proceedings of the 4th international conference on Distributed computing and internet technology
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Why STM can be more than a research toy
Communications of the ACM
A shape analysis for optimizing parallel graph programs
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Hybrid binary rewriting for memory access instrumentation
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Lowering STM overhead with static analysis
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
STM with transparent API considered harmful
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Runtime elision of transactional barriers for captured memory
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Transactionalizing legacy code: an experience report using GCC and Memcached
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Profile-guided transaction coalescing—lowering transactional overheads by merging transactions
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
In this paper, we identify transaction-local memory as a major source of overhead from compiler instrumentation in software transactional memory (STM). Transaction-local memory is memory allocated inside a transaction, which cannot escape (i.e., is captured by) the allocating transaction. Accesses to such memory do not require calls to STM memory access functions (also called STM barriers). A compiler unaware of that, however, may translate simple memory load/store operations accessing such memory into more expensive STM barriers. This presents us opportunities to improve STM performance. Our measurements with the STAMP benchmark suite (version 0.9.9) revealed that as many as 60% of the STM barriers generated by our baseline compiler can be accesses to captured memory, which include 90% of the write barriers and 45% of the read barriers. We propose runtime and compiler optimizations to elide STM barriers to captured memory. Similar techniques can also be used to elide barriers for accesses to thread-local and read-only data. We implemented those optimizations in the Intel C++ STM compiler. Our experiments with the STAMP benchmark suite on a Intel Dunnington system (with 24 cores in a 4-node SMP system) showed that upto 18% performance improvement could be achieved at 16 threads.