A lightweight in-place implementation for software thread-level speculation

Authors:
Cosmin E. Oancea;Alan Mycroft;Tim Harris
Affiliations:
The University of Cambridge, Cambridge, United Kingdom;The University of Cambridge, Cambridge, United Kingdom;Microsoft Research, Cambridge, United Kingdom
Venue:
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Year:
2009

Citing 23
Cited 7

Array abstractions using semantic analysis of trapezoid congruences

ICS '92 Proceedings of the 6th international conference on Supercomputing
An integrated compilation and performance analysis environment for data parallel programs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Reference idempotency analysis: a framework for optimizing speculative execution

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Coarse-Grained Thread Pipelining: A Speculative Parallel Execution Model for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Toward efficient and robust software speculative parallelization on multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting Method-Level Parallelism in Single-Threaded Java Programs

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

IPDPS '02 Proceedings of the 16th International Symposium on Parallel and Distributed Processing
Hybrid analysis: static & dynamic memory reference analysis

International Journal of Parallel Programming
Safe futures for Java

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
McRT-STM: a high performance software transactional memory system for a multi-core runtime

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing memory transactions

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Sensitivity analysis for automatic parallelization on multi-cores

Proceedings of the 21st annual international conference on Supercomputing
Software thread-level speculation: an optimistic library implementation

Proceedings of the 1st international workshop on Multicore software engineering
The semantics of x86-CC multiprocessor machine code

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Software thread level speculation for the java language and virtual machine environment

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing

Perspectives on Transactional Memory

CONCUR 2009 Proceedings of the 20th International Conference on Concurrency Theory
Probabilistic points-to analysis for java

CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Fastpath speculative parallelization

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Financial software on GPUs: between Haskell and Fortran

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Optimizing software runtime systems for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
FastLane: improving performance of software transactional memory for low thread counts

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Unifying thread-level speculation and transactional memory

Proceedings of the 13th International Middleware Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them. Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences). We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.