Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The structure of parafrase-2: an advanced parallelizing compiler for C and FORTRAN
Selected papers of the second workshop on Languages and compilers for parallel computing
The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The Polaris internal representation
International Journal of Parallel Programming
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Dr. Dobb's Journal
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Efficient static scheduling of loops on synchronous multiprocessors
Efficient static scheduling of loops on synchronous multiprocessors
Hi-index | 0.00 |
Modulo scheduling is a major optimization of high performance compilers wherein The body of a loop is replaced by an overlapping of instructions from different iterations. Hence the compiler can schedule more instructions in parallel than in the original option. Modulo scheduling, being a scheduling optimization, is a typical backend optimization relying on detailed description of the underlying CPU and its instructions to produce a good schedule. This work considers the problem of applying modulo scheduling at source level as a loop transformation, using only general information of the underlying CPU architecture. By doing so it is possible: a) Create a more retargeble compiler as modulo scheduling is now applied at source level, b) Study possible interactions between modulo scheduling and common loop transformations. c) Obtain a source level optimizer whose output is readable to the programmer, yet its final output can be efficiently compiled by a relatively "simple" compiler. Experimental results show that source level modulo scheduling can improve performance also when low level modulo scheduling is applied by the final compiler, indicating that high level modulo scheduling and low level modulo scheduling can co-exist to improve performance. An algorithm for source level modulo scheduling modifying the abstract syntax tree of a program is presented. This algorithm has been implemented in an automatic parallelizer (Tiny). Preliminary experiments yield runtime and power improvements also for the ARM CPU for embedded systems.