Checkpoint repair for high-performance out-of-order execution machines
IEEE Transactions on Computers
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparison of full and partial predicated execution support for ILP processors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Register renaming and dynamic speculation: an alternative approach
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Proceedings of the 24th annual international symposium on Computer architecture
Multipath execution: opportunities and limits
ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Reducing branch misprediction penalties via dynamic control independence detection
ICS '99 Proceedings of the 13th international conference on Supercomputing
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A large, fast instruction window for tolerating cache misses
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Skipper: a microarchitecture for exploiting control-flow independence
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Study of Control Independence in Superscalar Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Instruction Recycling on a Multiple-Path Processor
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Dynamic Branch Prediction with Perceptrons
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Control Flow Optimization Via Dynamic Reconvergence Prediction
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Out-of-Order Commit Processors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Reducing Branch Misprediction Penalty via Selective Branch Recovery
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
RETCON: transactional repair without replay
Proceedings of the 37th annual international symposium on Computer architecture
The potential of using dynamic information flow analysis in data value prediction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SYRANT: SYmmetric resource allocation on not-taken and taken paths
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Distributed replay protocol for distributed uniprocessors
Proceedings of the 26th ACM international conference on Supercomputing
Simultaneous branch and warp interweaving for sustained GPU performance
Proceedings of the 39th Annual International Symposium on Computer Architecture
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Superscalar architectures have been proposed that exploit control independence, reducing the performance penalty of branch mispredictions by preserving the work of future misprediction-independent instructions. The essential goal of exploiting control independence is to completely decouple future misprediction-independent instructions from deferred misprediction-dependent instructions. Current implementations fall short of this goal because they explicitly maintain program order among misprediction-independent and misprediction-dependent instructions. Explicit approaches sacrifice design efficiency and ultimately performance. We observe it is sufficient to emulate program order. Potential misprediction-dependent instructions are singled out a priori and their unchanging source values are checkpointed. These instructions and values are set aside as a "recovery program". Checkpointed source values break the data dependencies with co-mingled misprediction-independent instructions - now long since gone from the pipeline - achieving the essential decoupling objective. When the mispredicted branch resolves, recovery is achieved by fetching the self-sufficient, condensed recovery program. Recovery is effectively transparent to the pipeline, in that speculative state is not rolled back and recovery appears as a jump to code. A coarse-grain retirement substrate permits the relaxed order between the decoupled programs. Transparent control independence (TCI) yields a highly streamlined pipeline that quickly recycles resources based on conventional speculation, enabling a large window with small cycle-critical resources, and prevents many mispredictions from disrupting this large window. TCI achieves speedups as high as 64% (16% average) and 88% (22% average) for 4-issue and 8-issue pipelines, respectively, among 15 SPEC integer benchmarks. Factors that limit the performance of explicitly ordered approaches are quantified.