Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Guarded execution and branch prediction in dynamic ILP processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The effects of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Alpha AXP architecture reference manual (2nd ed.)
Alpha AXP architecture reference manual (2nd ed.)
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 27th annual international symposium on Computer architecture
Dual path instruction processing
ICS '02 Proceedings of the 16th international conference on Supercomputing
Enhancing loop buffering of media and telecommunications applications using low-overhead predication
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Itanium Processor Microarchitecture
IEEE Micro
Phi-Predication for light-weight if-conversion
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Hi-index | 0.00 |
If-conversion and predicated execution are widely adopted to eliminate branch misprediction penalty. Previous predication execution depends on compiler to generate explicit predicated instructions. In this paper, a trace-based predicate mechanism named RIMP (Runtime IMplicit Predication) is discussed. The candidates of if-conversion will be identified during dynamic execution. Conventional trace cache has been modified to store RIMP traces, which include instructions both from fall-through and target block following the conditional branch. Hardware extension will add predication to RIMP trace automatically. With the help of RIMP, legacy applications can benefit from predication mechanism without recompiling source code. Simulation of RIMP implementation under diverse microarchitecture configurations is presented in the paper. Results have shown promising performance improvement. In general, RIMP with 64kB trace storage delivers an average 10.3% IPC improvement while actually speeding up the execution time by over 7%.