Simplifying concurrent algorithms by exploiting hardware transactional memory

Authors:
Dave Dice;Yossi Lev;Virendra J. Marathe;Mark Moir;Dan Nussbaum;Marek Olszewski
Affiliations:
Sun Labs, Oracle, Burlington, MA, USA;Sun Labs, Oracle & Brown Univerisity, Burlington, MA, USA;Sun Labs, Oracle, Burlington, MA, USA;Sun Labs, Oracle, Burlington, MA, USA;Sun Labs, Oracle, Burlington, MA, USA;Sun Labs, Oracle & MIT, Burlington, MA, USA
Venue:
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Year:
2010

Citing 28
Cited 14

Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPARC architecture manual: version 8

The SPARC architecture manual: version 8
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Mostly lock-free malloc

Proceedings of the 3rd international symposium on Memory management
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Even Better DCAS-Based Concurrent Deques

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
Obstruction-Free Synchronization: Double-Ended Queues as an Example

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects

IEEE Transactions on Parallel and Distributed Systems
Scalable lock-free dynamic memory allocation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Nonblocking memory management support for dynamic-sized data structures

ACM Transactions on Computer Systems (TOCS)
Dynamic circular work-stealing deque

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Hybrid transactional memory

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
MetaTM/TxLinux: transactional memory for an operating system

Proceedings of the 34th annual international symposium on Computer architecture
SNZI: scalable NonZero indicators

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
TxLinux: using and managing hardware transactional memory in an operating system

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Early experience with a commercial hardware transactional memory implementation

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Scalable reader-writer locks

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The design of a task parallel library

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Understanding PARSEC performance on contemporary CMPs

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Early experience with a commercial hardware transactional memory implementation

Early experience with a commercial hardware transactional memory implementation
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
Nonblocking algorithms and backward simulation

DISC'09 Proceedings of the 23rd international conference on Distributed computing
Transactional locking II

DISC'06 Proceedings of the 20th international conference on Distributed Computing

The inherent complexity of transactional memory and what to do about it

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
ASF: AMD64 Extension for Lock-Free Data Structures and Transactional Memory

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Invited paper: the inherent complexity of transactional memory and what to do about it

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Single-version STMs can be multi-version permissive

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional memory

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Cache index-aware memory allocation

Proceedings of the international symposium on Memory management
On the power of hardware transactional memory to simplify memory management

Proceedings of the 30th annual ACM SIGACT-SIGOPS symposium on Principles of distributed computing
STM in the small: trading generality for performance in software transactional memory

Proceedings of the 7th ACM european conference on Computer Systems
What kinds of applications can benefit from transactional memory?

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Dynamic synthesis for relaxed memory models

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Delegation and nesting in best-effort hardware transactional memory

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Using hardware transactional memory to correct and simplify and readers-writer lock algorithm

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Opportunities and pitfalls of multi-core scaling using hardware transaction memory

Proceedings of the 4th Asia-Pacific Workshop on Systems
Performance evaluation of Intel® transactional synchronization extensions for high-performance computing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the potential of hardware transactional memory (HTM) to improve concurrent algorithms. We illustrate a number of use cases in which HTM enables significantly simpler code to achieve similar or better performance than existing algorithms for conventional architectures. We use Sun's prototype multicore chip, code-named Rock, to experiment with these algorithms, and discuss ways in which its limitations prevent better results, or would prevent production use of algorithms even if they are successful. Our use cases include concurrent data structures such as double ended queues, work stealing queues and scalable non-zero indicators, as well as a scalable malloc implementation and a simulated annealing application. We believe that our paper makes a compelling case that HTM has substantial potential to make effective concurrent programming easier, and that we have made valuable contributions in guiding designers of future HTM features to exploit this potential.