Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Language support for lightweight transactions
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Transactional Memory Coherence and Consistency
Proceedings of the 31st annual international symposium on Computer architecture
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Architectural Support for Software Transactional Memory
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An effective hybrid transactional memory system with strong isolation guarantees
Proceedings of the 34th annual international symposium on Computer architecture
Nested parallelism in transactional memory
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Intelligent speculation for pipelined multithreading
Intelligent speculation for pipelined multithreading
Leveraging parallel nesting in transactional memory
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Implementing and evaluating nested parallel transactions in software transactional memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
DISC'06 Proceedings of the 20th international conference on Distributed Computing
Delegation and nesting in best-effort hardware transactional memory
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.00 |
Transactional Memory (TM) simplifies parallel programming by supporting parallel tasks that execute in an atomic and isolated way. To achieve the best possible performance, TM must support the nested parallelism available in real-world applications and supported by popular programming models. A few recent papers have proposed support for nested parallelism in software TM (STM) and hardware TM (HTM). However, the proposed designs are still impractical, as they either introduce excessive runtime overheads or require complex hardware structures. This paper presents filter-accelerated, nested TM (FaNTM). We extend a hybrid TM based on hardware signatures to provide practical support for nested parallel transactions. In the FaNTM design, hardware filters provide continuous and nesting-aware conflict detection, which effectively eliminates the excessive overheads of software nested transactions. In contrast to a full HTM approach, FaNTM simplifies hardware by decoupling nested parallel transactions from caches using hardware filters. We also describe subtle correctness and liveness issues that do not exist in the non-nested baseline TM. We quantify the performance of FaNTM using STAMP applications and microbenchmarks that use concurrent data structures. First, we demonstrate that the runtime overhead of FaNTM is small (2.3% on average) when applications use only single-level parallelism. Second, we show that the incremental performance overhead of FaNTM is reasonable when the available parallelism is used in deeper nesting levels. We also demonstrate that nested parallel transactions on FaNTM run significantly faster (e.g., 12.4x) than those on a nested STM. Finally, we show how nested parallelism is used to improve the overall performance of a transactional microbenchmark.