Handling irreducible loops: optimized node splitting versus DJ-graphs

Authors:
Sebastian Unger;Frank Mueller
Affiliations:
North Carolina State University, Raleigh, NC;North Carolina State University, Raleigh, NC
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2002

Citing 26
Cited 6

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
A portable global optimizer and linker

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Introduction to algorithms

Introduction to algorithms
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Ease: an environment for architecture study and experimentation

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Crafting a compiler with C

Crafting a compiler with C
Lazy code motion

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Avoiding unconditional jumps by code replication

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Avoiding conditional branches by code replication

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A new framework for exhaustive and incremental data flow analysis using DJ graphs

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A framework for generalized control dependence

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Identifying loops using DJ graphs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Nesting of reducible and irreducible loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Making graphs reducible with controlled node splitting

ACM Transactions on Programming Languages and Systems (TOPLAS)
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Advanced compiler design and implementation

Advanced compiler design and implementation
Identifying loops in almost linear time

ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
On loops, dominators, and dominance frontier

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Flow Analysis of Computer Programs

Flow Analysis of Computer Programs
Handling Irreducible Loops: Optimized Node Splitting vs. DJ-Graphs

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Control flow analysis

Proceedings of a symposium on Compiler optimization
Register Transfer Standard

Register Transfer Standard

Using Hammock Graphs to Structure Programs

IEEE Transactions on Software Engineering
Parametric timing analysis and its application to dynamic voltage scaling

ACM Transactions on Embedded Computing Systems (TECS)
A study of irreducibility in C programs

Software—Practice & Experience
Single Assignment Compiler, Single Assignment Architecture: Future Gated Single Assignment Form*; Static Single Assignment with Congruence Classes

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Simple analysis of partial worst-case execution paths on general control flow graphs

Proceedings of the Eleventh ACM International Conference on Embedded Software
Recovering memory access patterns of executable programs

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the question of how to handle irreducible regions during optimization, which has become even more relevant for contemporary processors since recent VLIW-like architectures highly rely on instruction scheduling. The contributions of this paper are twofold. First, a method of optimized node splitting to transform irreducible regions of control flow into reducible regions is formally defined and its correctness is shown. This method is superior to approaches previously published since it reduces the number of replicated nodes by comparison. Second, three methods that handle regions of irreducible control flow are evaluated with respect to their impact on compiler optimizations. First, traditional node splitting is evaluated. Second, optimized node splitting is implemented. Third, DJ-Graphs are utilized to recognize nesting of irreducible (and reducible) loops and apply common loop optimizations extended for irreducible loops. Experiments compare the performance of these approaches with unrecognized irreducible loops that cannot be subject to loop optimizations, which is typical for contemporary compilers. Measurements show improvements of 1 to 40% for these methods of handling irreducible loops over the unoptimized case. Optimized node splitting may be chosen to retrofit existing compilers since it has the advantage that it only requires few changes to an optimizing compiler while limiting the code growth of compiled programs compared to traditional node splitting. Recognizing loops via DJ-Graphs should be chosen for new compiler developments since it requires more changes to the optimizer but does not significantly change the code size of compiled programs while yielding comparable improvements. Handling irreducible loops should even yield more benefits for exploiting instruction-level parallelism of modern architectures in the context of global instruction scheduling and optimization techniques that may introduce irreducible loops, such as enhanced modulo scheduling.