A compiler cost model for speculative parallelization

Authors:
Jialin Dou;Marcelo Cintra
Affiliations:
ARM Ltd., Cambridge, UK;University of Edinburgh, Edinburgh, United Kingdom
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2007

Citing 42
Cited 3

Accurate static estimators for program optimization

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The hierarchical task graph as a universal intermediate representation

International Journal of Parallel Programming
Accurate static branch prediction by value range propagation

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Speculative multithreaded processors

ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Improving the performance of speculatively parallel applications on the Hydra CMP

ICS '99 Proceedings of the 13th international conference on Supercomputing
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Exact analysis of the cache behavior of nested loops

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
A general compiler framework for speculative multithreading

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Compiler optimization of scalar value communication between speculative threads

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
Simics: A Full System Simulation Platform

Computer
The Stanford Hydra CMP

IEEE Micro
Compiler Techniques for Concurrent Multithreading with Hardware Speculation Support

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Using thread-level speculation to simplify manual parallelization

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler support for speculative multithreading architecture with probabilistic points-to analysis

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Improving Speculative Thread-Level Parallelism Through Module Run-Length Prediction

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Let's Study Whole-Program Cache Behaviour Analytically

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Value Communication for Thread-Level Speculation

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Compiling for the multiscalar architecture

Compiling for the multiscalar architecture
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A cost-driven compilation framework for speculative parallelization of sequential programs

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Heuristics for Profile-Driven Method-Level Speculative Parallelization

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation

Proceedings of the 19th annual international conference on Supercomputing
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
Tolerating Dependences Between Large Speculative Threads Via Sub-Threads

Proceedings of the 33rd annual international symposium on Computer Architecture
The structure of a compiler for explicit and implicit parallelism

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

Set-Congruence Dynamic Analysis for Thread-Level Speculation (TLS)

Languages and Compilers for Parallel Computing
Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore

Proceedings of the 7th ACM international conference on Computing frontiers
A thread partitioning approach for speculative multithreading

The Journal of Supercomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Speculative parallelization is a technique that allows code sections that cannot be fully analyzed by the compiler to be aggressively executed in parallel. However, while speculative parallelization can potentially deliver significant speedups, several overheads associated with this technique can limit these speedups in practice. This paper proposes a novel compiler static cost model of speculative multithreaded execution that can be used to predict the resulting performance. This model attempts to predict the expected speedups, or slowdowns, of the candidate speculative sections based on the estimation of the combined runtime effects of various overheads, and taking into account the scheduling restrictions of most speculative execution environments. The model is based on estimating the likely execution duration of threads and considers all the possible permutations of these threads. This model also produces a quantitative estimate of the speedup, which is different from prior heuristics that only qualitatively estimate the benefits of speculative multithreaded execution. In previous work, a limited version of the framework was evaluated on a number of loops from a collection of SPEC benchmarks that suffer mainly from load imbalance and thread dispatch and commit overheads. In this work, an extended framework is also evaluated on loops that may suffer from data-dependence violations. Experimental results show that prediction accuracy is lower when loops with violations are included. Nevertheless, accuracy is still very high for a static model: the framework can identify, on average, 45% of the loops that cause slowdowns and, on average, 96% of the loops that lead to speedups; it predicts the speedups or slowdowns with an error of less than 20% for an average of 28% of the loops across the benchmarks and with an error of less than 50% for an average of 80% of the loops. Overall, the framework often outperforms, by as much as 25%, a naive approach that attempts to speculatively parallelize all the loops considered, and is able to curb the large slowdowns caused in many cases by this naive approach.