Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Synchronization transformations for parallel computing
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Lock coarsening: eliminating lock overhead in automatically parallelized object-based programs
Journal of Parallel and Distributed Computing
CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Compiler and runtime support for efficient software transactional memory
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Architectural Support for Software Transactional Memory
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An effective hybrid transactional memory system with strong isolation guarantees
Proceedings of the 34th annual international symposium on Computer architecture
An integrated hardware-software approach to flexible transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic performance tuning of word-based software transactional memory
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiler and runtime techniques for software transactional memory optimization
Concurrency and Computation: Practice & Experience - Compilers for Parallel Computers 2007 Workshop (CPC 2007)
Optimizing transactions for captured memory
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Discovering and understanding performance bottlenecks in transactional applications
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transactional Memory, 2nd Edition
Transactional Memory, 2nd Edition
Modeling the Run-time Behavior of Transactional Memory
MASCOTS '10 Proceedings of the 2010 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
An Analytical Model on the Execution of Transactional Memory
SBAC-PAD '10 Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing
Lowering STM overhead with static analysis
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Design and implementation of the HPCS graph analysis benchmark on symmetric multiprocessors
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Evaluation of Blue Gene/Q hardware support for transactional memories
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
TagTM - accelerating STMs with hardware tags for fast meta-data access
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Previous studies in software transactional memory mostly focused on reducing the overhead of transactional read and write operations. In this article, we introduce transaction coalescing, a profile-guided compiler optimization technique that attempts to reduce the overheads of starting and committing a transaction by merging two or more small transactions into one large transaction. We develop a profiling tool and a transaction coalescing heuristic to identify candidate transactions suitable for coalescing. We implement a compiler extension to automatically merge the candidate transactions at the compile time. We evaluate the effectiveness of our technique using the hash table micro-benchmark and the STAMP benchmark suite. Transaction coalescing improves the performance of the hash table significantly and the performance of Vacation and SSCA2 benchmarks by 19.4% and 36.4%, respectively, when running with 12 threads.