Transparent control independence (TCI)

Authors:
Ahmed S. Al-Zawawi;Vimal K. Reddy;Eric Rotenberg;Haitham H. Akkary
Affiliations:
North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC;Intel Corporation, Hilllsboro, OR
Venue:
Proceedings of the 34th annual international symposium on Computer architecture
Year:
2007

Citing 25
Cited 9

Checkpoint repair for high-performance out-of-order execution machines

IEEE Transactions on Computers
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Register renaming and dynamic speculation: an alternative approach

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Multipath execution: opportunities and limits

ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory dependence prediction using store sets

Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture

Proceedings of the 25th annual international symposium on Computer architecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Reducing branch misprediction penalties via dynamic control independence detection

ICS '99 Proceedings of the 13th international conference on Supercomputing
Control independence in trace processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Skipper: a microarchitecture for exploiting control-flow independence

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Study of Control Independence in Superscalar Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Instruction Recycling on a Multiple-Path Processor

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Dynamic Branch Prediction with Perceptrons

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Out-of-Order Commit Processors

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Reducing Branch Misprediction Penalty via Selective Branch Recovery

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture

Fetch-Criticality Reduction through Control Independence

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
On the potential of latency tolerant execution in speculative multithreading

IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
RETCON: transactional repair without replay

Proceedings of the 37th annual international symposium on Computer architecture
The potential of using dynamic information flow analysis in data value prediction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
SYRANT: SYmmetric resource allocation on not-taken and taken paths

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Leveraging Strength-Based Dynamic Information Flow Analysis to Enhance Data Value Prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Distributed replay protocol for distributed uniprocessors

Proceedings of the 26th ACM international conference on Supercomputing
Simultaneous branch and warp interweaving for sustained GPU performance

Proceedings of the 39th Annual International Symposium on Computer Architecture
Disjoint out-of-order execution processor

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Superscalar architectures have been proposed that exploit control independence, reducing the performance penalty of branch mispredictions by preserving the work of future misprediction-independent instructions. The essential goal of exploiting control independence is to completely decouple future misprediction-independent instructions from deferred misprediction-dependent instructions. Current implementations fall short of this goal because they explicitly maintain program order among misprediction-independent and misprediction-dependent instructions. Explicit approaches sacrifice design efficiency and ultimately performance. We observe it is sufficient to emulate program order. Potential misprediction-dependent instructions are singled out a priori and their unchanging source values are checkpointed. These instructions and values are set aside as a "recovery program". Checkpointed source values break the data dependencies with co-mingled misprediction-independent instructions - now long since gone from the pipeline - achieving the essential decoupling objective. When the mispredicted branch resolves, recovery is achieved by fetching the self-sufficient, condensed recovery program. Recovery is effectively transparent to the pipeline, in that speculative state is not rolled back and recovery appears as a jump to code. A coarse-grain retirement substrate permits the relaxed order between the decoupled programs. Transparent control independence (TCI) yields a highly streamlined pipeline that quickly recycles resources based on conventional speculation, enabling a large window with small cycle-critical resources, and prevents many mispredictions from disrupting this large window. TCI achieves speedups as high as 64% (16% average) and 88% (22% average) for 4-issue and 8-issue pipelines, respectively, among 15 SPEC integer benchmarks. Factors that limit the performance of explicitly ordered approaches are quantified.