Towards soft optimization techniques for parallel cognitive applications
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An effective hybrid transactional memory system with strong isolation guarantees
Proceedings of the 34th annual international symposium on Computer architecture
Performance pathologies in hardware transactional memory
Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency
Proceedings of the 34th annual international symposium on Computer architecture
The potential for variable-granularity access tracking for optimistic parallelism
Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Flexible Decoupled Transactional Memory Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Scalable and reliable communication for hardware transactional memory
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
WormBench: a configurable workload for evaluating transactional memory systems
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
InvisiFence: performance-transparent memory ordering in conventional multiprocessors
Proceedings of the 36th annual international symposium on Computer architecture
EazyHTM: eager-lazy hardware transactional memory
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Low-cost router microarchitecture for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Debugging programs that use atomic blocks and transactional memory
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Implementation tradeoffs in the design of flexible transactional memory support
Journal of Parallel and Distributed Computing
Discovering and understanding performance bottlenecks in transactional applications
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient sequential consistency using conditional fences
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A Dynamically Adaptable Hardware Transactional Memory
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A practical low-latency router architecture with wing channel for on-chip network
Microprocessors & Microsystems
Hardware acceleration of transactional memory on commodity systems
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
RMS-TM: a comprehensive benchmark suite for transactional memory systems
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
ZEBRA: a data-centric, hybrid-policy hardware transactional memory design
Proceedings of the international conference on Supercomputing
Proceedings of the 38th annual international symposium on Computer architecture
Efficient sequential consistency via conflict ordering
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Complementing user-level coarse-grain parallelism with implicit speculative parallelism
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hardware transactional memory for GPU architectures
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hardware support for enforcing isolation in lock-based parallel programs
Proceedings of the 26th ACM international conference on Supercomputing
BlockChop: dynamic squash elimination for hybrid processor architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
Transactional prefetching: narrowing the window of contention in hardware transactional memory
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
SCIN-cache: Fast speculative versioning in multithreaded cores
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Proceedings of the 21st International conference on Real-Time Networks and Systems
VGTS: variable granularity transactional snoop
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SI-TM: reducing transactional memory abort rates through snapshot isolation
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (dead-lock, livelock, priority inversion, convoying). For TM to be adopted in the long term, not only does it need to deliver on these promises, but it needs to scale to a high number of processors. To date, proposals for scalable TM have relegated livelock issues to user-level contention managers. This paper presents the first scalable TM implementation for directory-based distributed shared memory systems that is livelock free without the need for user-level intervention. The design is a scalable implementation of optimistic concurrency control that supports parallel commits with a two-phase commit protocol, uses write-back caches, and filters coherence messages. The scalable design is based on Transactional Coherence and Consistency (TCC), which supports continuous transactions and fault isolation. A performance evaluation of the design using both scientific and enterprise benchmarks demonstrates that the directory-based TCC design scales efficiently for NUMA systems up to 64 processors.