Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The Cydra 5 minisupercomputer: architecture and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ACM Computing Surveys (CSUR)
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Custom-fit processors: letting applications define architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
A comparative study of modulo scheduling techniques
ICS '02 Proceedings of the 16th international conference on Supercomputing
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Modulo scheduling without overlapped lifetimes
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Hi-index | 0.00 |
Modulo scheduling is a very effective instruction scheduling technique that exploits Instruction Level Parallelism (ILP) in loop bodies by overlapping the execution of successive iterations. Unfortunately, modulo scheduling has been shown to cause heavy code expansion. To avoid the penalties of code expansion, some processors have dedicated hardware support for modulo scheduled loops. However, this dedicated hardware support has a cost in chip area, cycle time, processor complexity, and compiler complexity.This paper shows that the right combination of scheduling heuristics combined with speculative modulo scheduling can significantly reduce code expansion. In addition, several code generation schema heuristics are proposed to further reduce code expansion. The evaluations show that loops can be effectively modulo scheduled with an average code expansion only 1.5 times the original loop size. Compared with a state of the art modulo scheduler, our code size sensitive heuristics reduce the size of embedded domain benchmarks binaries by 30% on average. While performance is mostly unchanged, some applications show speed-ups up to 20% due to a reduction in instruction cache capacity misses.