Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The priority-based coloring approach to register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Register allocation via graph coloring
Register allocation via graph coloring
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
Critical path reduction for scalar programs
Proceedings of the 28th annual international symposium on Microarchitecture
Register allocation for predicated code
Proceedings of the 28th annual international symposium on Microarchitecture
Alpha implementations and architecture: complete reference and guide
Alpha implementations and architecture: complete reference and guide
Analysis techniques for predicated code
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A fast algorithm for finding dominators in a flowgraph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Analysis techniques for predicated code
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A framework for balancing control flow and predication
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication
International Journal of Parallel Programming
Accurate and efficient predicate analysis with binary decision diagrams
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Using predicate path information in hardware to determine true dependences
ICS '02 Proceedings of the 16th international conference on Supercomputing
Efficient static single assignment form for predication
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Path Analysis and Renaming for Predicated Instruction Scheduling
International Journal of Parallel Programming
The Intel IA-64 Compiler Code Generator
IEEE Micro
Static Analysis for Guarded Code
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimizations to prevent cache penalties for the Intel® Itanium® 2 Processor
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicate-aware scheduling: a technique for reducing resource constraints
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Field-testing IMPACT EPIC research results in Itanium 2
Proceedings of the 31st annual international symposium on Computer architecture
Unpredication, Unscheduling, Unspeculation: Reverse Engineering Itanium Executables
IEEE Transactions on Software Engineering
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
Strategies for predicate-aware register allocation
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Hi-index | 0.00 |
To fully utilize the wide machine resources in modern high-performance microprocessors it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different basic blocks to be converted to straight-line code guarded by boolean predicates. How ever, predicated execution also presents significant challenges to an optimizing compiler. For example, in live range analysis, a predicated definition does not necessarily end the live range of a virtual register. This paper describes techniques to analyze the relations among predicates in order to improve the precision and effectiveness of various compiler analysis and transformation phases in the presence of predicated code. Our predicate analysis operates globally to obtain relations among predicates. Moreover, we analyze control flow and predication in a single unified framework. The result can be queried by subsequent optimization and analysis phases. Based on this framework, we extend a traditional method to a predicate-aware register allocator which takes global predicate relations into account. We have implemented the proposed algorithms to effectively reduce register pressure. Our experimental results show 24.6% of a large test suite obtain, on average, 20.71% better register allocation due to the algorithms presented in this paper.