Software Pipelining in Nested Loops with Prolog-Epilog Merging

Authors:
Mohammed Fellahi;Albert Cohen
Affiliations:
Alchemy Group, INRIA Saclay, France, and HiPEAC Network,;Alchemy Group, INRIA Saclay, France, and HiPEAC Network,
Venue:
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Year:
2008

Citing 25
Cited 1

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Array-data flow analysis and its use in array privatization

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop Shifting for Loop Compaction

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Efficient Pipelining of Nested Loops: Unroll-and-Squash

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimal Software Pipelining of Nested Loops

Proceedings of the 8th International Symposium on Parallel Processing
Software Pipelining of Nested Loops

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Itanium 2 Processor Microarchitecture

IEEE Micro
Phased scheduling of stream programs

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Improving Software Pipelining With Unroll-and-Jam

HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
On Index Set Splitting

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Automatic Correction of Loop Transformations

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Early control of register pressure for software pipelined loops

CC'03 Proceedings of the 12th international conference on Compiler construction
The Multidimensional Knapsack Problem: Structure and Algorithms

INFORMS Journal on Computing
A New Heuristic for Solving the Multichoice Multidimensional Knapsack Problem

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Improving performance of nested loops on reconfigurable array processors

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software pipelining (or modulo scheduling) is a powerful back-end optimization to exploit instruction and vector parallelism. Software pipelining is particularly popular for embedded devices as it improves the computation throughput without increasing the size of the inner loop kernel (unlike loop unrolling), a desirable property to minimize the amount of code in local memories or caches. Unfortunately, common media and signal processing codes exhibit series of low-trip-count inner loops. In this situation, software pipelining is often not an option: it incurs severe fill/drain time overheads and code size expansion due to nested prologs and epilogs. We propose a method to pipeline series of inner loops without increasing the size of the loop nest, apart from an outermost prolog and epilog. Our method achieves significant code size savings and allows pipelining of low-trip-count loops. These benefits come at the cost of additional scheduling constraints, leading to a linear optimization problem to trade memory usage for pipelining opportunities.