A Scalable, Non-blocking Approach to Transactional Memory

  • Authors:
  • Hassan Chafi;Jared Casper;Brian D. Carlstrom;Austen McDonald;Chi Cao Minh;Woongki Baek;Christos Kozyrakis;Kunle Olukotun

  • Affiliations:
  • Computer Systems Laboratory, Stanford University. hchafi@stanford.edu;Computer Systems Laboratory, Stanford University. jaredc@stanford.edu;Computer Systems Laboratory, Stanford University. bdc@stanford.edu;Computer Systems Laboratory, Stanford University. austenmc@stanford.edu;Computer Systems Laboratory, Stanford University. caominh@stanford.edu;Computer Systems Laboratory, Stanford University. wbaek@stanford.edu;Computer Systems Laboratory, Stanford University. kozyrakis@stanford.edu;Computer Systems Laboratory, Stanford University. kunle@stanford.edu

  • Venue:
  • HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (dead-lock, livelock, priority inversion, convoying). For TM to be adopted in the long term, not only does it need to deliver on these promises, but it needs to scale to a high number of processors. To date, proposals for scalable TM have relegated livelock issues to user-level contention managers. This paper presents the first scalable TM implementation for directory-based distributed shared memory systems that is livelock free without the need for user-level intervention. The design is a scalable implementation of optimistic concurrency control that supports parallel commits with a two-phase commit protocol, uses write-back caches, and filters coherence messages. The scalable design is based on Transactional Coherence and Consistency (TCC), which supports continuous transactions and fault isolation. A performance evaluation of the design using both scientific and enterprise benchmarks demonstrates that the directory-based TCC design scales efficiently for NUMA systems up to 64 processors.