Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Transactional lock-free execution of lock-based programs
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Unbounded Transactional Memory
IEEE Micro
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance pathologies in hardware transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
Dependence-aware transactional memory for increased concurrency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Extending concurrency of transactional memory programs by using value prediction
Proceedings of the 6th ACM conference on Computing frontiers
RETCON: transactional repair without replay
Proceedings of the 37th annual international symposium on Computer architecture
Distributed replay protocol for distributed uniprocessors
Proceedings of the 26th ACM international conference on Supercomputing
Hi-index | 0.00 |
Parallel programming is receiving renewed attention with the advent of multi-core CPU architectures. The Transactional Memory (TM) paradigm has the potential to provide good speedup and make parallel programming easier to adopt. Under low contention, it has been shown that TM programs can outperform standard lock-based programs. However, under high contention, performance of TM programs can degrade. Previous work has shown that we can use either data forwarding or value prediction to improve performance under high contention. Both these techniques demand significant changes to the architecture and coherence protocol above and beyond those required by TM. In this work, we analyze and compare these approaches. Our objective is to find a solution that improves performance without needing significant hardware additions or changes to the coherence protocol. We observe that for most transactions conflicts are limited to only a few threads at a time. We design a system that uses this knowledge to reduce the hardware for a TM system that tries to avoid conflicts using early value communication. Our results show that we can get comparable performance of the proposed techniques with minimal extra hardware.