FastLane: improving performance of software transactional memory for low thread counts

Authors:
Jons-Tobias Wamhoff;Christof Fetzer;Pascal Felber;Etienne Rivière;Gilles Muller
Affiliations:
Technische Universität Dresden, Dresden, Germany;Technische Universität Dresden, Dresden, Germany;Université de Neuchâtel, Neuchâtel, Switzerland;Université de Neuchâtel, Neuchâtel, Switzerland;INRIA, Paris, France
Venue:
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2013

Citing 25
Cited 0

Communities of Interest

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Code Generation and Optimization for Transactional Memory Constructs in an Unmanaged Language

Proceedings of the International Symposium on Code Generation and Optimization
RSTM: A Relaxed Consistency Software Transactional Memory for Multicores

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
On the correctness of transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Dynamic performance tuning of word-based software transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Automatic data partitioning in software transactional memories

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
A comprehensive strategy for contention management in software transactional memory

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A runtime system for software lock elision

Proceedings of the 4th ACM European conference on Computer systems
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
A lightweight in-place implementation for software thread-level speculation

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Adaptive Locks: Combining Transactions and Locks for Efficient Concurrency

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
NOrec: streamlining STM by abolishing ownership records

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Evaluation of AMD's advanced synchronization facility within a complete transactional memory stack

Proceedings of the 5th European conference on Computer systems
Lightweight, robust adaptivity for software transactional memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Evolution of thread-level parallelism in desktop applications

Proceedings of the 37th annual international symposium on Computer architecture
Transactional Memory, 2nd Edition

Transactional Memory, 2nd Edition
Transactional mutex locks

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Why STM can be more than a research toy

Communications of the ACM
RobuSTM: a robust software transactional memory

SSS'10 Proceedings of the 12th international conference on Stabilization, safety, and security of distributed systems
Lock-free and scalable multi-version software transactional memory

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Deadline-aware scheduling for Software Transactional Memory

DSN '11 Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks
Transactional locking II

DISC'06 Proceedings of the 20th international conference on Distributed Computing
A lazy snapshot algorithm with eager validation

DISC'06 Proceedings of the 20th international conference on Distributed Computing
Fastpath speculative parallelization

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Community-based analysis of netflow for early detection of security incidents

LISA'11 Proceedings of the 25th international conference on Large Installation System Administration

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software transactional memory (STM) can lead to scalable implementations of concurrent programs, as the relative performance of an application increases with the number of threads that support it. However, the absolute performance is typically impaired by the overheads of transaction management and instrumented accesses to shared memory. This often leads STM-based programs with low thread counts to perform worse than a sequential, non-instrumented version of the same application. In this paper, we propose FastLane, a new STM algorithm that bridges the performance gap between sequential execution and classical STM algorithms when running on few cores. FastLane seeks to reduce instrumentation costs and thus performance degradation in its target operation range. We introduce a novel algorithm that differentiates between two types of threads: One thread (the master) executes transactions pessimistically without ever aborting, thus with minimal instrumentation and management costs, while other threads (the helpers) can commit speculative transactions only when they do not conflict with the master. Helpers thus contribute to the application progress without impairing on the performance of the master. We implement FastLane as an extension of a state-of-the-art STM runtime system and compiler. Multiple code paths are produced for execution on a single, few, and many cores. The runtime system selects the code path providing the best throughput, depending on the number of cores available on the target machine. Evaluation results indicate that our approach provides promising performance at low thread counts: FastLane almost systematically wins over a classical STM in the 1-6 threads range, and often performs better than sequential execution of the non-instrumented version of the same application starting with 2 threads.