Mixed speculative multithreaded execution models

Authors:
Polychronis Xekalakis;Nikolas Ioannou;Marcelo Cintra
Affiliations:
University of Edinburgh, Spain;University of Edinburgh, Switzerland;University of Edinburgh, UK
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2012

Citing 32
Cited 0

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data cache performance by pre-executing instructions under a cache miss

ICS '97 Proceedings of the 11th international conference on Supercomputing
Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Multipath execution: opportunities and limits

ICS '98 Proceedings of the 12th international conference on Supercomputing
Confidence estimation for speculation control

Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture

Proceedings of the 25th annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Execution-based prediction using speculative slices

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Slipstream processors: improving both performance and fault tolerance

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Speculative precomputation: long-range prefetching of delinquent loads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Difficult-path branch prediction using subordinate microthreads

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Confidence Estimation for Branch Prediction Reversal

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Multiple-path execution for chip multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
Beating in-order stalls with "flea-flicker" two-pass pipelining

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpointed Early Load Retirement

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Reducing misspeculation overhead for module-level speculative execution

Proceedings of the 2nd conference on Computing frontiers
Analysis of the O-GEometric History Length Branch Predictor

Proceedings of the 32nd annual international symposium on Computer Architecture
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation

Proceedings of the 19th annual international conference on Supercomputing
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
CAVA: Using checkpoint-assisted value prediction to hide L2 misses

ACM Transactions on Architecture and Code Optimization (TACO)
Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Combining thread level speculation helper threads and runahead execution

Proceedings of the 23rd international conference on Supercomputing
Simultaneous speculative threading: a novel pipeline architecture implemented in sun's rock processor

Proceedings of the 36th annual international symposium on Computer architecture
Mapping Out a Path from Hardware Transactional Memory to Speculative Multithreading

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Employing Transactional Memory and Helper Threads to Speedup Dijkstra's Algorithm

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Speculative Parallelization in Decoupled Look-ahead

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The current trend toward multicore architectures has placed great pressure on programmers and compilers to generate thread-parallel programs. Improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), but, instead, via multithreaded execution. One notable technique that facilitates the extraction of parallel threads from sequential applications is thread-level speculation (TLS). This technique allows programmers/compilers to generate threads without checking for inter-thread data and control dependences, which are then transparently enforced by the hardware. Most prior work on TLS has concentrated on thread selection and mechanisms to efficiently support the main TLS operations, such as squashes, data versioning, and commits. This article seeks to enhance TLS functionality by combining it with other speculative multithreaded execution models. The main idea is that TLS already requires extensive hardware support, which when slightly augmented can accommodate other speculative multithreaded techniques. Recognizing that for different applications, or even program phases, the application bottlenecks may be different, it is reasonable to assume that the more versatile a system is, the more efficiently it will be able to execute the given program. Toward this direction, we first show that mixed execution models that combine TLS with Helper Threads (HT), RunAhead execution (RA) and MultiPath execution (MP) perform better than any of the models alone. Based on a simple model that we propose, we show that benefits come from being able to extract additional ILP without harming the TLP extracted by TLS. We then show that by combining all the execution models in a unified one that combines all these speculative multithreaded models, ILP can be further enhanced with only minimal additional cost in hardware.