Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Y-Pipe: a conditional branching scheme without pipeline delays
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The anatomy of the register file in a multiscalar processor
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The effect of speculatively updating branch history on branch prediction accuracy, revisited
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Confidence estimation for speculation control
Proceedings of the 25th annual international symposium on Computer architecture
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
Task selection for a multiscalar processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Reducing branch misprediction penalties via dynamic control independence detection
ICS '99 Proceedings of the 13th international conference on Supercomputing
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch mechanisms for multipath execution processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A Study of Control Independence in Superscalar Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Realizing High IPC Using Time-Tagged Resource-Flow Computing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Instruction fetch deferral using static slack
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Implicitly-multithreaded processors
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
Control Flow Optimization Via Dynamic Reconvergence Prediction
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Control-Flow Independence Reuse via Dynamic Vectorization
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A serializability violation detector for shared-memory server programs
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A simple speculative load control mechanism for energy saving
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
BranchTap: improving performance with very few checkpoints through adaptive speculation control
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Transparent control independence (TCI)
Proceedings of the 34th annual international symposium on Computer architecture
IEEE Transactions on Computers
Energy saving through a simple load control mechanism
ACM SIGARCH Computer Architecture News
Improving single-thread performance with fine-grain state maintenance
Proceedings of the 5th conference on Computing frontiers
Fetch-Criticality Reduction through Control Independence
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Reexecution and Selective Reuse in Checkpoint Processors
Transactions on High-Performance Embedded Architectures and Compilers II
SYRANT: SYmmetric resource allocation on not-taken and taken paths
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Although modern superscalar processors achieve high branch prediction accuracy, certain branches either are inherently difficult to predict or incur destructive interference in prediction tables, causing significant performance loss due to mispredictions. We propose a novel microarchitecture, called Skipper, to handle such difficult branches by exploiting control-flow independence. Previous approaches to handling difficult branches, one way or another, amount to executing incorrect instructions, squandering cycles and resources such as the i-cache bandwidth. Skipper altogether avoids incorrect instructions by skipping over, without even fetching, the control-flow dependent computation conditioned by a difficult branch. Instead, Skipper fetches and executes the control-flow independent instructions, which are past the point where the branch's taken and not-taken paths reconverge, and which need to be executed irrespective of the branch outcome. Because Skipper executes the correct control-flow dependent instructions after the difficult branch is resolved, it conserves the valuable resources.Skipper is the first proposal to exploit control-flow independence by skipping over control-flow dependent computation in a superscalar pipeline. Skipper fetches the skipped control-flow dependent instructions after the post-reconvergent instructions, out of program order. We describe key mechanisms to implement Skipper without unduly complicating the pipeline despite out-of-order fetch. SPECint95 simulations show that Skipper performs 10% and 8% better than superscalar and the previously-proposed Polypath, respectively, when all three microarchitectures have equal i-cache bandwidth and hardware resources.