Array abstractions using semantic analysis of trapezoid congruences
ICS '92 Proceedings of the 6th international conference on Supercomputing
An integrated compilation and performance analysis environment for data parallel programs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Parallel and Distributed Systems
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Reference idempotency analysis: a framework for optimizing speculative execution
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
IEEE Transactions on Parallel and Distributed Systems
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Toward efficient and robust software speculative parallelization on multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting Method-Level Parallelism in Single-Threaded Java Programs
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
IPDPS '02 Proceedings of the 16th International Symposium on Parallel and Distributed Processing
Hybrid analysis: static & dynamic memory reference analysis
International Journal of Parallel Programming
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing memory transactions
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Bulk Disambiguation of Speculative Threads in Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Sensitivity analysis for automatic parallelization on multi-cores
Proceedings of the 21st annual international conference on Supercomputing
Software thread-level speculation: an optimistic library implementation
Proceedings of the 1st international workshop on Multicore software engineering
The semantics of x86-CC multiprocessor machine code
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Software thread level speculation for the java language and virtual machine environment
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Perspectives on Transactional Memory
CONCUR 2009 Proceedings of the 20th International Conference on Concurrency Theory
Probabilistic points-to analysis for java
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Fastpath speculative parallelization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Financial software on GPUs: between Haskell and Fortran
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Optimizing software runtime systems for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
FastLane: improving performance of software transactional memory for low thread counts
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Unifying thread-level speculation and transactional memory
Proceedings of the 13th International Middleware Conference
Hi-index | 0.00 |
Thread-level speculation (TLS) is a technique that allows parts of a sequential program to be executed in parallel. TLS ensures the parallel program's behaviour remains true to the language's original sequential semantics; for example, allowing multiple iterations of a loop to run in parallel if there are no conflicts between them. Conventional software-TLS algorithms detect conflicts dynamically. They suffer from a number of problems. TLS implementations can impose large storage overheads caused by buffering speculative work. TLS implementations can offer disappointing scalability, if threads can only commit speculative work back to the "real" heap sequentially. TLS implementations can be slow because speculative reads must consult look-aside tables to see earlier speculative writes, or because speculative operations replace normal reads and writes with expensive synchronisation primitives (e.g. CAS or memory fences). We present a streamlined software-TLS algorithm for mostly-parallel loops that aims to avoid these problems. We allow speculative work to be performed in place, so we avoid buffering, and so that reads naturally see earlier writes. We avoid needing a serial-commit protocol. We avoid the need for CAS or memory fences in common operations. We strive to reduce the size of TLS-related conflict-detection state, and to interact well with typical data-cache implementations. We evaluate our implementation on off-the-shelf hardware using seven applications from SciMark2, BYTEmark and JOlden. We achieve an average 77% of the speed-up of manually-parallelized versions of the benchmarks for fully parallel loops. We achieve a maximum of a 5.8x speed-up on an 8-core machine.