PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Automatic data/program partitioning using the single assignment principle
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Scheduling time-critical instructions on RISC machines
POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling time-critical instructions on RISC machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining: a comparison and improvement
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Unroll-and-jam using uniformly generated sets
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A Development Environment for Horizontal Microcode
IEEE Transactions on Software Engineering
Fully Parallel Hardware/Software Codesign for Multi-Dimensional DSP Applications
CODES '96 Proceedings of the 4th International Workshop on Hardware/Software Co-Design
Improving register allocation for subscripted variables
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Software thread integration for instruction-level parallelism
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.08 |
Loop unwinding is a well-known technique for reducing loop overhead, exposing parallelism, and increasing the efficiency of pipelining. Traditional loop unwinding is limited to the innermost loop of a set of nested loops and the amount of unwinding is either fixed or must be specified by the user. In this paper we present a general technique, loop quantization, for unwinding multiple nested loops, explain its advantages over other transformations, and illustrate its practical effectiveness. An abstraction of nested loops is presented which leads to results about the complexity of computing quantizations and an algorithm.