Single-Dimension Software Pipelining for Multi-Dimensional Loops

Authors:
Hongbo Rong;Zhizhong Tang;R. Govindarajan;Alban Douillet;Guang R. Gao
Affiliations:
-;-;-;-;-
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2004

Citing 25
Cited 18

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Fine-grain parallelization and the wavefront method

Selected papers of the second workshop on Languages and compilers for parallel computing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining

ACM Computing Surveys (CSUR)
Resource-Constrained Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Framework for Resource-Constrained Rate-Optimal Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
The parallel execution of DO loops

Communications of the ACM
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Constructing and exploiting linear schedules with prescribed parallelism

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
Scheduling and Automatic Parallelization

Scheduling and Automatic Parallelization
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Constructive Methods for Scheduling Uniform Loop Nests

IEEE Transactions on Parallel and Distributed Systems
Efficient Pipelining of Nested Loops: Unroll-and-Squash

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimal Software Pipelining of Nested Loops

Proceedings of the 8th International Symposium on Parallel Processing
Automatic Parallelization in the Polytope Model

The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Pipelining-Dovetailing: A Transformation to Enhance Software Pipelining for Nested Loops

CC '96 Proceedings of the 6th International Conference on Compiler Construction
Improving Software Pipelining With Unroll-and-Jam

HICSS '96 Proceedings of the 29th Hawaii International Conference on System Sciences Volume 1: Software Technology and Architecture
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Register allocation for software pipelined multi-dimensional loops

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
Hierarchical coarse-grained stream compilation for software defined radio

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Code-size conscious pipelining of imperfectly nested loops

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Software Pipelining in Nested Loops with Prolog-Epilog Merging

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Outer loop pipelining for application specific datapaths in FPGAs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Stream Compilation for Real-Time Embedded Multicore Systems

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Combining optimizations in automated low power design

Proceedings of the Conference on Design, Automation and Test in Europe
Hierarchical multithreading: programming model and system software

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Register pressure in software-pipelined loop nests: fast computation and impact on architecture design

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Combined ILP and register tiling: analytical model and optimization framework

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Improving performance of nested loops on reconfigurable array processors

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A static data dependence analysis approach for software pipelining

NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
A dynamic data dependence analysis approach for software pipelining

NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Multi-dimensional kernel generation for loop nest software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditionally, software pipelining is applied either to theinnermost loop of a given loop nest or from the innermostloop to outer loops. In this paper, we propose a three-stepapproach, called Single-dimension Software Pipelining(SSP), to software pipeline a loop nest at an arbitraryloop level.The first step identifies the most profitable loop level forsoftware pipelining in terms of initiation rate or data reusepotential. The second step simplifies the multi-dimensionaldata-dependence graph (DDG) into a 1-dimensional DDGand constructs a 1-dimensional schedule for the selectedloop level. The third step derives a simple mapping functionwhich specifies the schedule time for the operations of themulti-dimensional loop, based on the 1-dimensional schedule.We prove that the SSP method is correct and at least asefficient as other modulo scheduling methods.We establish the feasibility and correctness of our approachby implementing it on the IA-64 architecture. Experimentalresults on a small number of loops show significantperformance improvements over existing modulo schedulingmethods that software pipeline a loop nest from the innermostloop.