Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Proceedings of the 24th annual international symposium on Computer architecture
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Reducing branch misprediction penalties via dynamic control independence detection
ICS '99 Proceedings of the 13th international conference on Supercomputing
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Register integration: a simple and efficient implementation of squash reuse
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Design tradeoffs for the Alpha EV8 conditional branch predictor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Skipper: a microarchitecture for exploiting control-flow independence
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A framework for modeling and optimization of prescient instruction prefetch
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
A Study of Control Independence in Superscalar Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Thread-Spawning Schemes for Speculative Multithreading
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Prophet/Critic Hybrid Branch Prediction
Proceedings of the 31st annual international symposium on Computer architecture
Hardware Support for Prescient Instruction Prefetch
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A serializability violation detector for shared-memory server programs
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Reexecution and Selective Reuse in Checkpoint Processors
Transactions on High-Performance Embedded Architectures and Compilers II
Dynamic data race detection for correlated variables
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
SYRANT: SYmmetric resource allocation on not-taken and taken paths
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Simultaneous branch and warp interweaving for sustained GPU performance
Proceedings of the 39th Annual International Symposium on Computer Architecture
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
This paper presents a novel microarchitecture technique for accurately predicting control flow reconvergence dynamically. A reconvergence point is the earliest dynamic instruction in the program where we can expect program paths to reconverge regardless of the outcome or target of the current branch. Thus, even if the immediate control flow after a branch is uncertain, execution following the reconvergence point is certain. This paper proposes a novel hardware re-convergence predictor which is both implementable and accurate, with a 4KB predictor achieving more than 95% accuracy for SPEC INT, and larger implementations achieving greater than 99% accuracy. The information provided from reconvergence prediction can increase the effectiveness of a range of previously proposed performance optimizations, including speculative multithreading, control independence, and squash reuse. This paper also demonstrates a new technique that takes advantage of the dynamic reconvergence prediction information in order to predict a wrong path excursion ahead of branch resolution. On average, 34% of wrong path fetches are eliminated.