Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

Authors:
María Jesús Garzarán;Milos Prvulovic;José María Llabería;Víctor Viñals;Lawrence Rauchwerger;Josep Torrellas
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;Georgia Institute of Technology;Universitat Politècnica de Catalunya, Spain;Universidad de Zaragoza, Spain;Texas A&M University;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2005

Citing 21
Cited 11

Implementing Precise Interrupts in Pipelined Processors

IEEE Transactions on Computers
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
On the Automatic Parallelization of the Perfect Benchmarks®

IEEE Transactions on Parallel and Distributed Systems
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
The Superthreaded Processor Architecture

IEEE Transactions on Computers
An architecture for mostly functional languages

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Removing architectural bottlenecks to the scalability of speculative parallelization

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel Programming with Polaris

Computer
Hardware Support for Extracting Coarse-Grain Speculative Parallelism in Distributed Shared-Memory Multiprocesors

ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors

Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors
Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques

Bulk Disambiguation of Speculative Threads in Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Dependence-aware transactional memory for increased concurrency

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Improving performance of simple cores by exploiting loop-level parallelism through value prediction and reconfiguration

Proceedings of the 6th ACM conference on Computing frontiers
On the efficacy of call graph-level thread-level speculation

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Supporting speculative parallelization in the presence of dynamic data structures

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
Enhanced speculative parallelization via incremental recovery

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
SCIN-cache: Fast speculative versioning in multithreaded cores

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Speculative parallelization: eliminating the overhead of failure

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
IBM Blue Gene/Q memory subsystem with speculative execution and transactional memory

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thread-Level Speculation (TLS) provides architectural support to aggressively run hard-to-analyze code in parallel. As speculative tasks run concurrently, they generate unsafe or speculative memory state that needs to be separately buffered and managed in the presence of distributed caches and buffers. Such a state may contain multiple versions of the same variable. In this paper, we introduce a novel taxonomy of approaches to buffer and manage multiversion speculative memory state in multiprocessors. We also present a detailed complexity-benefit tradeoff analysis of the different approaches. Finally, we use numerical applications to evaluate the performance of the approaches under a single architectural framework. Our key insights are that support for buffering the state of multiple speculative tasks and versions per processor is more complexity-effective than support for lazily merging the state of tasks with main memory. Moreover, both supports can be gainfully combined and, in large machines, their effect is nearly fully additive. Finally, the more complex support for storing future state in the main memory can boost performance when buffers are under pressure, but hurts performance when squashes are frequent.