A Scalable, Non-blocking Approach to Transactional Memory

Authors:
Hassan Chafi;Jared Casper;Brian D. Carlstrom;Austen McDonald;Chi Cao Minh;Woongki Baek;Christos Kozyrakis;Kunle Olukotun
Affiliations:
Computer Systems Laboratory, Stanford University. hchafi@stanford.edu;Computer Systems Laboratory, Stanford University. jaredc@stanford.edu;Computer Systems Laboratory, Stanford University. bdc@stanford.edu;Computer Systems Laboratory, Stanford University. austenmc@stanford.edu;Computer Systems Laboratory, Stanford University. caominh@stanford.edu;Computer Systems Laboratory, Stanford University. wbaek@stanford.edu;Computer Systems Laboratory, Stanford University. kozyrakis@stanford.edu;Computer Systems Laboratory, Stanford University. kunle@stanford.edu
Venue:
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Year:
2007

Citing 0
Cited 38

Towards soft optimization techniques for parallel cognitive applications

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An effective hybrid transactional memory system with strong isolation guarantees

Proceedings of the 34th annual international symposium on Computer architecture
Performance pathologies in hardware transactional memory

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
The potential for variable-granularity access tracking for optimistic parallelism

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Flexible Decoupled Transactional Memory Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Ef?ciently

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Scalable and reliable communication for hardware transactional memory

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
WormBench: a configurable workload for evaluating transactional memory systems

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
EazyHTM: eager-lazy hardware transactional memory

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Low-cost router microarchitecture for on-chip networks

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Debugging programs that use atomic blocks and transactional memory

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Transactional memory

Journal of Parallel and Distributed Computing
Lightweight Transactional Memory systems for NoCs based architectures: Design, implementation and comparison of two policies

Journal of Parallel and Distributed Computing
Implementation tradeoffs in the design of flexible transactional memory support

Journal of Parallel and Distributed Computing
Discovering and understanding performance bottlenecks in transactional applications

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ScalableBulk: Scalable Cache Coherence for Atomic Blocks in a Lazy Environment

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A Dynamically Adaptable Hardware Transactional Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A practical low-latency router architecture with wing channel for on-chip network

Microprocessors & Microsystems
Hardware acceleration of transactional memory on commodity systems

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
RMS-TM: a comprehensive benchmark suite for transactional memory systems

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
ZEBRA: a data-centric, hybrid-policy hardware transactional memory design

Proceedings of the international conference on Supercomputing
FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes

Proceedings of the 38th annual international symposium on Computer architecture
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hardware transactional memory for GPU architectures

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hardware support for enforcing isolation in lock-based parallel programs

Proceedings of the 26th ACM international conference on Supercomputing
BlockChop: dynamic squash elimination for hybrid processor architecture

Proceedings of the 39th Annual International Symposium on Computer Architecture
Transactional prefetching: narrowing the window of contention in hardware transactional memory

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
SCIN-cache: Fast speculative versioning in multithreaded cores

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Deploying hardware locks to improve performance and energy efficiency of hardware transactional memory

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Leveraging transactional memory for a predictable execution of applications composed of hard real-time and best-effort tasks

Proceedings of the 21st International conference on Real-Time Networks and Systems
VGTS: variable granularity transactional snoop

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SI-TM: reducing transactional memory abort rates through snapshot isolation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (dead-lock, livelock, priority inversion, convoying). For TM to be adopted in the long term, not only does it need to deliver on these promises, but it needs to scale to a high number of processors. To date, proposals for scalable TM have relegated livelock issues to user-level contention managers. This paper presents the first scalable TM implementation for directory-based distributed shared memory systems that is livelock free without the need for user-level intervention. The design is a scalable implementation of optimistic concurrency control that supports parallel commits with a two-phase commit protocol, uses write-back caches, and filters coherence messages. The scalable design is based on Transactional Coherence and Consistency (TCC), which supports continuous transactions and fault isolation. A performance evaluation of the design using both scientific and enterprise benchmarks demonstrates that the directory-based TCC design scales efficiently for NUMA systems up to 64 processors.