Compiler and hardware support for reducing the synchronization of speculative threads

Authors:
Antonia Zhai;J. Gregory Steffan;Christopher B. Colohan;Todd C. Mowry
Affiliations:
University of Minnesota, Minneapolis, MN;University of Toronto, Toronto, Canada;Google, Ann Arbor, Michigan;Carnegie Mellon University, Pittsburgh, Pennsylvania
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2008

Citing 42
Cited 5

The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Lazy code motion

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Preliminary experiences with the Fortran D compiler

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Dynamic memory disambiguation using the memory conflict buffer

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Exceeding the dataflow limit via value prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving data-flow analysis with path profiles

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture

IEEE Transactions on Computers
An architecture for mostly functional languages

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
Value prediction for speculative multithreaded architectures

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel Programming with Polaris

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Run-Time Disambiguation: Coping with Statically Unpredictable Dependencies

IEEE Transactions on Computers
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Using thread-level speculation to simplify manual parallelization

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Compiling for the multiscalar architecture

Compiling for the multiscalar architecture
Master/slave speculative parallelization and approximate code

Master/slave speculative parallelization and approximate code
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exposing speculative thread parallelism in SPEC2000

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Optimistic intra-transaction parallelism on chip multiprocessors

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Compiler optimization of value communication for thread-level speculation

Compiler optimization of value communication for thread-level speculation
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
High-Speed Multiprocessors and Compilation Techniques

IEEE Transactions on Computers

An Overview of Prophet

ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Automatic parallelization of simulink applications

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Skewed pipelining for parallel simulink simulations

Proceedings of the Conference on Design, Automation and Test in Europe
Two examples of parallel programming without concurrency constructs (PP-CC)

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Safe parallel programming using dynamic dependence hints

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thread-level speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this article, we focus on one important limitation of program performance under TLS, which stalls as a result of synchronizing and forwarding scalar values between speculative threads that would otherwise cause frequent data dependences and, hence, failed speculation. Using SPECint benchmarks that have been automatically transformed by our compiler to exploit TLS, we present, evaluate in detail, and compare both compiler and hardware techniques for improving the communication of scalar values. We find that through our dataflow algorithms for three increasingly aggressive instruction scheduling techniques, the compiler can drastically reduce the critical forwarding path introduced by the synchronization and forwarding of scalar values. We also show that hardware techniques for reducing synchronization can be complementary to compiler scheduling, but that the additional performance benefits are minimal and are generally not worth the cost.