Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Guarded execution and branch prediction in dynamic ILP processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The effects of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A framework for balancing control flow and predication
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Confidence estimation for speculation control
Proceedings of the 25th annual international symposium on Computer architecture
Using profiling to reduce branch misprediction costs on a dynamically scheduled processor
Proceedings of the 14th international conference on Supercomputing
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The Alpha 21264 Microprocessor
IEEE Micro
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
A Lightweight Algorithm for Dynamic If-Conversion during Dynamic Optimization
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Register Renaming and Scheduling for Dynamic Execution of Predicated Code
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Dynamic Branch Prediction with Perceptrons
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Field-testing IMPACT EPIC research results in Itanium 2
Proceedings of the 31st annual international symposium on Computer architecture
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
IEEE Computer Architecture Letters
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Proceedings of the International Symposium on Code Generation and Optimization
A systematic approach to delivering instruction-level parallelism in epic systems
A systematic approach to delivering instruction-level parallelism in epic systems
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving the performance of object-oriented languages with dynamic predication of indirect jumps
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Dynamic predication has been proposed to reduce the branch misprediction penalty due to hard-to-predict branch instructions. A recently proposed dynamic predication architecture, the diverge-merge processor (DMP), provides large performance improvements by dynamically predicating a large set of complex control-flow graphs that result in branch mispredictions. DMP requires significant support from a profiling compiler to determine which branch instructions and control-flow structures can be dynamically predicated. However, previous work on dynamic predication did not extensively examine the tradeoffs involved in profiling and code generation for dynamic predication architectures. This paper describes compiler support for obtaining high performance in the diverge-merge processor. We describe new profile-driven algorithms and heuristics to select branch instructions that are suitable and profitable for dynamic predication. We also develop a new profile-based analytical cost-benefit model to estimate, at compiletime, the performance benefits of the dynamic predication of different types of control-flow structures including complex hammocks and loops. Our evaluations show that DMP can provide 20.4% average performance improvement over a conventional processor on SPEC integer benchmarks with our optimized compiler algorithms, whereas the average performance improvement of the best-performing alternative simple compiler algorithm is 4.5%. We also find that, with the proposed algorithms, DMP performance is not significantly affected by the differences in profile- and run-time input data sets.