Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Software transactional memory for dynamic-sized data structures
Proceedings of the twenty-second annual symposium on Principles of distributed computing
Composable memory transactions
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
McRT-Malloc: a scalable transactional memory allocator
Proceedings of the 5th international symposium on Memory management
An effective hybrid transactional memory system with strong isolation guarantees
Proceedings of the 34th annual international symposium on Computer architecture
Enforcing isolation and ordering in STM
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language
Proceedings of the International Symposium on Code Generation and Optimization
Privatization techniques for software transactional memory
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
The OpenTM Transactional Application Programming Interface
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Implications of False Conflict Rate Trends for Robust Software Transactional Memory
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Software transactional memory: why is it only a research toy?
Communications of the ACM - Remembering Jim Gray
Software Transactional Memory: Why Is It Only a Research Toy?
Queue - The Concurrency Problem
QuakeTM: parallelizing a complex sequential application using transactional memory
Proceedings of the 23rd international conference on Supercomputing
Stretching transactional memory
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Optimizing transactions for captured memory
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Debugging programs that use atomic blocks and transactional memory
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
New abstractions for effective performance analysis of STM programs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Why the grass may not be greener on the other side: a comparison of locking vs. transactional memory
ACM SIGOPS Operating Systems Review
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Discovering and understanding performance bottlenecks in transactional applications
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A dynamic evaluation of the precision of static heap abstractions
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
DISC'10 Proceedings of the 24th international conference on Distributed computing
Why STM can be more than a research toy
Communications of the ACM
Safe nondeterminism in a deterministic-by-default parallel language
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Commutative set: a language extension for implicit parallel programming
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Transactional conflict decoupling and value prediction
Proceedings of the international conference on Supercomputing
STM with transparent API considered harmful
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
A speculation-friendly binary search tree
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
The runtime abort graph and its application to software transactional memory optimization
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Reconciling transactional conflicts with compiler's help
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Runtime elision of transactional barriers for captured memory
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
TagTM - accelerating STMs with hardware tags for fast meta-data access
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A survey of support for structured communication in concurrency control models
Journal of Parallel and Distributed Computing
Hi-index | 0.02 |
Transactional Memory (TM) promises to simplify concurrent programming, which has been notoriously difficult but crucial in realizing the performance benefit of multi-core processors. Software Transaction Memory (STM), in particular, represents a body of important TM technologies since it provides a mechanism to run transactional programs when hardware TM support is not available, or when hardware TM resources are exhausted. Nonetheless, most previous researches on STMs were constrained to executing trivial, small-scale workloads. The assumption was that the same techniques applied to small-scale workloads could readily be applied to real-life, large-scale workloads. However, by executing several nontrivial workloads such as particle dynamics simulation and game physics engine on a state of the art STM, we noticed that this assumption does not hold. Specifically, we identified four major performance bottlenecks that were unique to the case of executing large-scale workloads on an STM: false conflicts, over-instrumentation, privatization-safety cost, and poor amortization. We believe that these bottlenecks would be common for any STM targeting real-world applications. In this paper, we describe those identified bottlenecks in detail, and we propose novel solutions to alleviate the issues. We also thoroughly validate these approaches with experimental results on real machines.