Multi-dimensional kernel generation for loop nest software pipelining

Authors:
Alban Douillet;Hongbo Rong;Guang R. Gao
Affiliations:
University of Delaware, Newark, DE;Microsoft Corporation, Redmond, WA;University of Delaware, Newark, DE
Venue:
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Year:
2006

Citing 19
Cited 0

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
A Framework for Resource-Constrained Rate-Optimal Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Combining loop transformations considering caches and scheduling

International Journal of Parallel Programming - Special issue: MICRO-29, 29th annual IEEE/ACM international symposium on microarchitecture
Constructing and exploiting linear schedules with prescribed parallelism

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient Pipelining of Nested Loops: Unroll-and-Squash

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Extending Software Pipelining Techniques for Scheduling Nested Loops

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops

CC '96 Proceedings of the 6th International Conference on Compiler Construction
Software Pipelining of Nested Loops

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Improving Software Pipelining With Unroll-and-Jam

HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Global optimization of microprograms through modular control constructs

MICRO 12 Proceedings of the 12th annual workshop on Microprogramming
Swing Modulo Scheduling: A Lifetime-Sensitive Approach

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Register allocation for software pipelined multi-dimensional loops

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A compiler framework for loop nest software-pipelining

A compiler framework for loop nest software-pipelining
Register pressure in software-pipelined loop nests: fast computation and impact on architecture design

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Single-dimension Software Pipelining (SSP) has been proposed as an effective software pipelining technique for multi-dimensional loops [16]. This paper introduces for the first time the scheduling methods that actually produce the kernel code. Because of the multi-dimensional nature of the problem, the scheduling problem is more complex and challenging than with traditional modulo scheduling. The scheduler must handle multiple subkernels and initiation rates under specific scheduling constraints, while producing a solution that minimizes the execution time of the final schedule. In this paper three approaches are proposed: the level-by-level method, which schedules operations in loop level order, starting from the innermost, and does not let other operations interfere with the already scheduled levels, the flat method, which schedules operations from different loop levels with the same priority, and the hybrid method, which uses the level-by-level mechanism for the innermost level and the flat solution for the other levels. The methods subsume Huff's modulo scheduling [8] for single loops as a special case. We also break a scheduling constraint introduced in earlier publications and allow for a more compact kernel. The proposed approaches were implemented in the Open64/ORC compiler, and evaluated on loop nests from the Livermore, SPEC200 and NAS benchmarks.