Ginger: control independence using tag rewriting

Authors:
Andrew D. Hilton;Amir Roth
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
Proceedings of the 34th annual international symposium on Computer architecture
Year:
2007

Citing 31
Cited 4

Executing a Program on the MIT Tagged-Token Dataflow Architecture

IEEE Transactions on Computers
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Guarded execution and branch prediction in dynamic ILP processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Disjoint eager execution: an optimal form of speculative execution

Proceedings of the 28th annual international symposium on Microarchitecture
Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Selective eager execution on the PolyPath architecture

Proceedings of the 25th annual international symposium on Computer architecture
Reducing branch misprediction penalties via dynamic control independence detection

ICS '99 Proceedings of the 13th international conference on Supercomputing
Control independence in trace processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A large, fast instruction window for tolerating cache misses

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Skipper: a microarchitecture for exploiting control-flow independence

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch deferral using static slack

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Predicate prediction for efficient out-of-order execution

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Study of Control Independence in Superscalar Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Parallelism in the front-end

Proceedings of the 30th annual international symposium on Computer architecture
Out-of-order fetch, decode, and issue

Out-of-order fetch, decode, and issue
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
Continual flow pipelines

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Control Flow Optimization Via Dynamic Reconvergence Prediction

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Piecewise Linear Branch Prediction

Proceedings of the 32nd annual international symposium on Computer Architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Branch Misprediction Penalty via Selective Branch Recovery

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Scalable Store-Load Forwarding via Store Queue Index Prediction

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture

Fetch-Criticality Reduction through Control Independence

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
RETCON: transactional repair without replay

Proceedings of the 37th annual international symposium on Computer architecture
SYRANT: SYmmetric resource allocation on not-taken and taken paths

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Distributed replay protocol for distributed uniprocessors

Proceedings of the 26th ACM international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The negative performance impact of branch mis-predictions can be reduced by exploiting control independence (CI). When a branch mis-predicts, the wrong-path instructions up to the point where control converges with the correct path are selectively squashed and replaced with correct-path instructions. Instructions beyond the convergence-point-the branch's control-independent (CI) instructions-are spared from squashing. Exploiting CI requires updating the input data dependences of CI instructions to reflect the selective removal and insertion of logically older instructions and transitively re-dispatching those CI instructions whose inputs have changed. This capability is generally called out-of-order renaming. Previously proposed CI designs use out-of-order renaming schemes that either consume excessive rename/dispatch bandwidth, can only be applied in limited cases, or incur a cost even when the branch would be correctly predicted. Ginger is a CI design that is both general and bandwidth efficient. Ginger implements out-of-order renaming using tag rewriting, re-linking the input dependences of CI instructions as they sit in the window. To do this, Ginger halts the pipeline uses the idle map table read and write ports and the issue queue match lines and write lines to perform a register-tag "search-and-replace" operation. After a few cycles, the pipeline restarts and execution resumes with correct data dependences. Cycle-level simulation shows that Ginger out-performs previous CI designs, yielding geometric mean speedups over an aggressive non-CI processor of 5%, 12%, and 11%-on SPECint2000, MediaBench, and Comm-Bench-with speedups of 15% or greater on 11 of 46 programs.