Using predicate path information in hardware to determine true dependences
ICS '02 Proceedings of the 16th international conference on Supercomputing
An EPIC Processor with Pending Functional Units
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Phi-Predication for light-weight if-conversion
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicate prediction for efficient out-of-order execution
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Decoupled Software Pipelining with the Synchronization Array
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
"Flea-flicker" Multipass Pipelining: An Alternative to the High-Power Out-of-Order Offense
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining
IEEE Transactions on Computers
Selective predicate prediction for out-of-order processors
Proceedings of the 20th annual international conference on Supercomputing
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors
Proceedings of the International Symposium on Code Generation and Optimization
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Abstract: To achieve higher processor performance requires greater synergy between advanced hardware features and innovative compiler techniques. Recent advancement in compilation techniques or predicated execution has provided significant opportunity in exploiting instruction level parallelism. However, little research has been done on how to efficiently execute predicated code in a dynamic microarchitecture. In this paper, we evaluate hardware optimizations for executing predicated code on a dynamically scheduled microarchitecture. We provide two novel ideas to improve the efficiency of executing predicated code. On a generic Intel Itanium processor pipeline model, we demonstrate that,with some microarchitecture enhancements, a dynamic execution processor can achieve about 16% performance improvement over an equivalent static execution processor.