Code-size conscious pipelining of imperfectly nested loops

Authors:
Mohammed Fellahi;Albert Cohen;Sid Touati
Affiliations:
INRIA Futurs, Orsay, France;INRIA Futurs, Orsay, France;University of Versailles, France
Venue:
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Year:
2007

Citing 23
Cited 0

Static scheduling of synchronous data flow programs for digital signal processing

IEEE Transactions on Computers
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Array-data flow analysis and its use in array privatization

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Schedule-independent storage mapping for loops

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Collective Loop Fusion for Array Contraction

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Itanium 2 Processor Microarchitecture

IEEE Micro
Phased scheduling of stream programs

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Improving Software Pipelining With Unroll-and-Jam

HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Applications of storage mapping optimization to register promotion

Proceedings of the 18th annual international conference on Supercomputing
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Automatic Correction of Loop Transformations

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Software-Pipelining on Multi-Core Architectures

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Early control of register pressure for software pipelined loops

CC'03 Proceedings of the 12th international conference on Compiler construction
The Multidimensional Knapsack Problem: Structure and Algorithms

INFORMS Journal on Computing
A New Heuristic for Solving the Multichoice Multidimensional Knapsack Problem

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper is a step towards enabling multidimensional software pipelining of non-perfectly nested loops on memory-constrained architectures. We propose a method to pipeline multiple inner loops without increasing the size of the loop nest, apart from an outermost prolog and epilog. We focus on the domain of media and signal processing, where short inner loops are common and where embedded constraints drive the selection of code-size conscious algorithms. Our first results indicate that the additional constraints associated with the method do not impede the extraction of significant amounts of instruction-level parallelism. In addition to preserving precious scratch-pad or cache memory, our method also avoids the performance overhead of prologs and epilogs resulting from pipelined inner loops with short trip count.