The STAMPede approach to thread-level speculation

Authors:
J. Gregory Steffan;Christopher Colohan;Antonia Zhai;Todd C. Mowry
Affiliations:
University of Toronto, Ontario, Canada;Carnegie Mellon University, Pittsburgh, PA;University of Minnesota, Minneapolis, MN;Carnegie Mellon University, Pittsburgh, PA
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2005

Citing 43
Cited 34

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Lazy code motion

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A study of single-chip processor/cache organizations for large numbers of transistors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The anatomy of the register file in a multiscalar processor

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Lazy threads: implementing a fast parallel call

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Removing architectural bottlenecks to the scalability of speculative parallelization

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Using thread-level speculation to simplify manual parallelization

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Toward efficient and robust software speculative parallelization on multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Run-Time Parallelization in Distributed Shared-Memory Multiprocessors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Implicitly-multithreaded processors

Proceedings of the 30th annual international symposium on Computer architecture
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Compiling for the multiscalar architecture

Compiling for the multiscalar architecture
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Hardware support for thread-level speculation

Hardware support for thread-level speculation

SableSpMT: a software framework for analysing speculative multithreading in Java

PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
A probabilistic pointer analysis for speculative optimizations

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Software behavior oriented parallelization

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
The potential for variable-granularity access tracking for optimistic parallelism

Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
Parallel-stage decoupled software pipelining

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Spice: speculative parallel iteration chunk execution

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Compiler and hardware support for reducing the synchronization of speculative threads

ACM Transactions on Architecture and Code Optimization (TACO)
Counting Dependence Predictors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Performance scalability of decoupled software pipelining

ACM Transactions on Architecture and Code Optimization (TACO)
Multi-execution: multicore caching for data-similar executions

Proceedings of the 36th annual international symposium on Computer architecture
A memory system design framework: creating smart memories

Proceedings of the 36th annual international symposium on Computer architecture
SPARTAN: A software tool for Parallelization Bottleneck Analysis

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Decoupled software pipelining creates parallelization opportunities

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Speculative parallelization of partial reduction variables

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Estimating and exploiting potential parallelism by source-level dependence profiling

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Improving cache locality for thread-level speculation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scalable Speculative Parallelization on Commodity Clusters

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting coarse-grain speculative parallelism

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Supporting speculative multithreading on simultaneous multithreaded processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Software thread level speculation for the java language and virtual machine environment

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Paragon: collaborative speculative loop execution on GPU and CPU

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
A model-driven approach for software parallelization

MODELS'11 Proceedings of the 2011th international conference on Models in Software Engineering
Automatic speculative DOALL for clusters

Proceedings of the Tenth International Symposium on Code Generation and Optimization
HiRe: using hint & release to improve synchronization of speculative threads

Proceedings of the 26th ACM international conference on Supercomputing
Dynamically dispatching speculative threads to improve sequential execution

ACM Transactions on Architecture and Code Optimization (TACO)
Unifying thread-level speculation and transactional memory

Proceedings of the 13th International Middleware Conference
Computational caches

Proceedings of the 6th International Systems and Storage Conference
The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution

ACM Transactions on Architecture and Code Optimization (TACO)
ASC: automatically scalable computation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Challenging the "embarrassingly sequential": parallelizing finite state machine-based computations through principled speculation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Efficient execution of speculative threads and transactions with hardware transactional memory

Future Generation Computer Systems
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multithreaded processor architectures are becoming increasingly commonplace: many current and upcoming designs support chip multiprocessing, simultaneous multithreading, or both. While it is relatively straightforward to use these architectures to improve the throughput of a multithreaded or multiprogrammed workload, the real challenge is how to easily create parallel software to allow single programs to effectively exploit all of this raw performance potential. One promising technique for overcoming this problem is Thread-Level Speculation (TLS), which enables the compiler to optimistically create parallel threads despite uncertainty as to whether those threads are actually independent. In this article, we propose and evaluate a design for supporting TLS that seamlessly scales both within a chip and beyond because it is a straightforward extension of write-back invalidation-based cache coherence (which itself scales both up and down). Our experimental results demonstrate that our scheme performs well on single-chip multiprocessors where the first level caches are either private or shared. For our private-cache design, the program performance of two of 13 general purpose applications studied improves by 86% and 56%, four others by more than 8%, and an average across all applications of 16%---confirming that TLS is a promising way to exploit the naturally-multithreaded processing resources of future computer systems.