Compiler optimization of scalar value communication between speculative threads

Authors:
Antonia Zhai;Christopher B. Colohan;J. Gregory Steffan;Todd C. Mowry
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Year:
2002

Citing 27
Cited 43

A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Compiler algorithms for synchronization

IEEE Transactions on Computers
Lazy code motion

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
The multiscalar architecture

The multiscalar architecture
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Improving data-flow analysis with path profiles

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Value prediction for speculative multithreaded architectures

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies

IEEE Transactions on Computers
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Compiling for the multiscalar architecture

Compiling for the multiscalar architecture
Master/slave speculative parallelization and approximate code

Master/slave speculative parallelization and approximate code
Hardware support for thread-level speculation

Hardware support for thread-level speculation

Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Using thread-level speculation to simplify manual parallelization

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A cost-driven compilation framework for speculative parallelization of sequential programs

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
A General Compiler Framework for Speculative Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Reactive Techniques for Controlling Software Speculation

Proceedings of the international symposium on Code generation and optimization
Exposing speculative thread parallelism in SPEC2000

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation

Proceedings of the 19th annual international conference on Supercomputing
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Proceedings of the 33rd annual international symposium on Computer Architecture
A lifetime optimal algorithm for speculative PRE

ACM Transactions on Architecture and Code Optimization (TACO)
CellSs: a programming model for the cell BE architecture

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
A compiler cost model for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO)
Incrementally parallelizing database transactions with thread-level speculation

ACM Transactions on Computer Systems (TOCS)
CellSs: making it easier to program the cell broadband engine processor

IBM Journal of Research and Development
Compiler optimizations for parallelizing general-purpose applications under thread-level speculation

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Communication optimizations for global multi-threaded instruction scheduling

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
On the potential of latency tolerant execution in speculative multithreading

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS)

Languages and Compilers for Parallel Computing
Compiler-Driven Dependence Profiling to Guide Program Parallelization

Languages and Compilers for Parallel Computing
Fast Track: A Software System for Speculative Program Optimization

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamic performance tuning for speculative threads

Proceedings of the 36th annual international symposium on Computer architecture
A lightweight in-place implementation for software thread-level speculation

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Performance estimation for task graphs combining sequential path profiling and control dependence regions

MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Exploiting speculative thread-level parallelism in data compression applications

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Speculative parallelization of partial reduction variables

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Balancing thread partition for efficiently exploiting speculative thread-level parallelism

APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies
Speculative parallelization using state separation and multiple value prediction

Proceedings of the 2010 international symposium on Memory management
Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient dynamic program monitoring on multi-core systems

Journal of Systems Architecture: the EUROMICRO Journal
Supporting speculative multithreading on simultaneous multithreaded processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Loop selection for thread-level speculation

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Hardware support for multithreaded execution of loops with limited parallelism

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
Paragon: collaborative speculative loop execution on GPU and CPU

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Exploiting thread-level speculative parallelism with software value prediction

ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Runtime automatic speculative parallelization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Dynamically dispatching speculative threads to improve sequential execution

ACM Transactions on Architecture and Code Optimization (TACO)
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)
The design and implementation of heterogeneous multicore systems for energy-efficient speculative thread execution

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

While there have been many recent proposals for hardware that supports Thread-Level Speculation (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to forwarding scalar values between threads that would otherwise cause frequent data dependences. We present and evaluate dataflow algorithms for three increasingly-aggressive instruction scheduling techniques that reduce the critical forwarding path introduced by the synchronization associated with this data forwarding. In addition, we contrast our compiler techniques with related hardware-only approaches. With our most aggressive compiler and hardware techniques, we improve performance under TLS by 6.2-28.5% for 6 of 14 applications, and by at least 2.7% for half of the other applications.