Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Journal of Supercomputing - Special issue on instruction-level parallelism
ACM Computing Surveys (CSUR)
Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops
CC '96 Proceedings of the 6th International Conference on Compiler Construction
Reduced code size modulo scheduling in the absence of hardware support
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Register allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Software Pipelining in Nested Loops with Prolog-Epilog Merging
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Hierarchical multithreading: programming model and system software
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Using a "codelet" program execution model for exascale machines: position paper
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Combined ILP and register tiling: analytical model and optimization framework
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Multi-dimensional kernel generation for loop nest software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 0.00 |
Traditionally, software pipelining is applied either to theinnermost loop of a given loop nest or from the innermostloop to the outer loops. In a companion paper, we proposeda scheduling method, called Single-dimension SoftwarePipelining (SSP), to software pipeline a multi-dimensionalloop nest at an arbitrary loop level.In this paper, we describe our solution to SSP code generation.In contrast to traditional software pipelining, SSPhandles two distinct repetitive patterns, and thus requiresnew code generation algorithms. Further, these two distinctrepetitive patterns complicate register assignment and requiretwo levels of register renaming. As rotating registerssupport renaming at only one level, our solution is based ona combination of dynamic register renaming (using rotatingregisters) and static register renaming (using code replication).Finally, code size increase, an even more important issuefor SSP than for traditional software-pipelining, is alsoaddressed. Optimizations are proposed to reduce code sizewithout significant performance degradation.We first present a code generation scheme and subsequentlyimplement it for the IA-64 architecture, making effectiveuse of rotating registers and predicated execution.We present some initial experimental results, which demonstratenot only the feasibility and correctness of our codegeneration scheme, but also its code quality.