Global predicate analysis and its application to register allocation

Authors:
David M. Gillies;Dz-ching Roy Ju;Richard Johnson;Michael Schlansker
Affiliations:
Hewlett-Packard California Language Lab, 11000 Wolfe Road, Cupertino, CA;Hewlett-Packard California Language Lab, 11000 Wolfe Road, Cupertino, CA;Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA;Hewlett-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA
Venue:
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Year:
1996

Citing 15
Cited 16

Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
The priority-based coloring approach to register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Global instruction scheduling for superscalar machines

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Reverse If-Conversion

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Register allocation via graph coloring

Register allocation via graph coloring
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Critical path reduction for scalar programs

Proceedings of the 28th annual international symposium on Microarchitecture
Register allocation for predicated code

Proceedings of the 28th annual international symposium on Microarchitecture
Alpha implementations and architecture: complete reference and guide

Alpha implementations and architecture: complete reference and guide
Analysis techniques for predicated code

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A fast algorithm for finding dominators in a flowgraph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction

Analysis techniques for predicated code

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A framework for balancing control flow and predication

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication

International Journal of Parallel Programming
Accurate and efficient predicate analysis with binary decision diagrams

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Using predicate path information in hardware to determine true dependences

ICS '02 Proceedings of the 16th international conference on Supercomputing
Efficient static single assignment form for predication

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The impact of if-conversion and branch prediction on program execution on the Intel® Itanium™ processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Path Analysis and Renaming for Predicated Instruction Scheduling

International Journal of Parallel Programming
The Intel IA-64 Compiler Code Generator

IEEE Micro
Static Analysis for Guarded Code

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimizations to prevent cache penalties for the Intel® Itanium® 2 Processor

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicate-aware scheduling: a technique for reducing resource constraints

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Field-testing IMPACT EPIC research results in Itanium 2

Proceedings of the 31st annual international symposium on Computer architecture
Unpredication, Unscheduling, Unspeculation: Reverse Engineering Itanium Executables

IEEE Transactions on Software Engineering
Synchronization optimizations for efficient execution on multi-cores

Proceedings of the 23rd international conference on Supercomputing
Strategies for predicate-aware register allocation

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

To fully utilize the wide machine resources in modern high-performance microprocessors it is necessary to exploit parallelism beyond individual basic blocks. Architectural support for predicated execution increases the degree of instruction level parallelism by allowing instructions from different basic blocks to be converted to straight-line code guarded by boolean predicates. How ever, predicated execution also presents significant challenges to an optimizing compiler. For example, in live range analysis, a predicated definition does not necessarily end the live range of a virtual register. This paper describes techniques to analyze the relations among predicates in order to improve the precision and effectiveness of various compiler analysis and transformation phases in the presence of predicated code. Our predicate analysis operates globally to obtain relations among predicates. Moreover, we analyze control flow and predication in a single unified framework. The result can be queried by subsequent optimization and analysis phases. Based on this framework, we extend a traditional method to a predicate-aware register allocator which takes global predicate relations into account. We have implemented the proposed algorithms to effectively reduce register pressure. Our experimental results show 24.6% of a large test suite obtain, on average, 20.71% better register allocation due to the algorithms presented in this paper.