Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Array-data flow analysis and its use in array privatization
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Collective Loop Fusion for Array Contraction
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Itanium 2 Processor Microarchitecture
IEEE Micro
Phased scheduling of stream programs
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Improving Software Pipelining With Unroll-and-Jam
HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Applications of storage mapping optimization to register promotion
Proceedings of the 18th annual international conference on Supercomputing
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Automatic Correction of Loop Transformations
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Software-Pipelining on Multi-Core Architectures
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Early control of register pressure for software pipelined loops
CC'03 Proceedings of the 12th international conference on Compiler construction
The Multidimensional Knapsack Problem: Structure and Algorithms
INFORMS Journal on Computing
A New Heuristic for Solving the Multichoice Multidimensional Knapsack Problem
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Hi-index | 0.00 |
This paper is a step towards enabling multidimensional software pipelining of non-perfectly nested loops on memory-constrained architectures. We propose a method to pipeline multiple inner loops without increasing the size of the loop nest, apart from an outermost prolog and epilog. We focus on the domain of media and signal processing, where short inner loops are common and where embedded constraints drive the selection of code-size conscious algorithms. Our first results indicate that the additional constraints associated with the method do not impede the extraction of significant amounts of instruction-level parallelism. In addition to preserving precious scratch-pad or cache memory, our method also avoids the performance overhead of prologs and epilogs resulting from pipelined inner loops with short trip count.