OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
TAPE: a transactional application profiling environment
Proceedings of the 19th annual international conference on Supercomputing
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Profiling Transactional Memory Applications
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Understanding the behavior of transactional memory applications
Proceedings of the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging
Discovering and understanding performance bottlenecks in transactional applications
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transactional Memory, 2nd Edition
Transactional Memory, 2nd Edition
PDP '11 Proceedings of the 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing
Computer Architecture, Fifth Edition: A Quantitative Approach
Computer Architecture, Fifth Edition: A Quantitative Approach
Capturing transactional memory application's behavior --- the prerequisite for performance analysis
MSEPT'12 Proceedings of the 2012 international conference on Multicore Software Engineering, Performance, and Tools
What scientific applications can benefit from hardware transactional memory?
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Transactional Memory (TM) offers new possibilities for algorithmic design. This paper evaluates TM implementations of two algorithmic variations of the wide-spread conjugate gradients method (CG) regarding their performance on multi-core CPUs employing TM. Through applying tools for TM that visualize the TM application behavior, we show that the main bottleneck for both is the waiting times at barriers and illustrate the implementation of reduction operations with TM in a beneficial way. Performance monitoring through using the PAPI interface uncovers the quantity and type of instructions that each algorithms requires. This basic work is the key for environment-aware numerics as well as a hint for software developers who plan to use TM.