Software pipelining

Authors:
Vicki H. Allan;Reese B. Jones;Randall M. Lee;Stephen J. Allan
Affiliations:
Utah State Univ., Logan;Evans and Sutherland, Salt Lake City, UT;DAKCS, Ogden, UT;Utah State Univ., Logan
Venue:
ACM Computing Surveys (CSUR)
Year:
1995

Citing 51
Cited 109

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Highly concurrent scalar processing

Highly concurrent scalar processing
URPR—An extension of URCR for software pipelining

MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data structures: form and function

Data structures: form and function
Optimal loop parallelization

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
“Combining” as a compilation technique for VLIW architectures

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture

Selected papers of the second workshop on Languages and compilers for parallel computing
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A timed Petri-net model for fine-grain loop scheduling

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Architecture synthesis of high-performance application-specific processors

Architecture synthesis of high-performance application-specific processors
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Instruction scheduling using genetic algorithms

Instruction scheduling using genetic algorithms
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A dynamic-programming technique for compacting loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reverse If-Conversion

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Avoidance and suppression of compensation code in a trace scheduling compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Specification of software pipelining using Petri nets

International Journal of Parallel Programming
Software pipelining: a comparison and improvement

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Using a lookahead window in a compaction-based parallelizing compiler

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Realistic scheduling: compaction for pipelined architectures

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A compilation technique for software pipelining of loops with conditional jumps

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
GURPR—a method for global software pipelining

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
An efficient search algorithm to find the elementary circuits of a graph

Communications of the ACM
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Parallel Programming in 2001

IEEE Software
A Development Environment for Horizontal Microcode

IEEE Transactions on Software Engineering
Perfect Pipelining: A New Loop Parallelization Technique

ESOP '88 Proceedings of the 2nd European Symposium on Programming
Software pipelining: A Genetic Algorithm Approach

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Software Pipelining: Petri Net Pacemaker

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Architectural support for the efficient generation of code for horizontal architectures

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Global optimization of microprograms through modular control constructs

MICRO 12 Proceedings of the 12th annual workshop on Microprogramming
An approach to microprogram optimization considering resource occupancy and instruction formats

MICRO 10 Proceedings of the 10th annual workshop on Microprogramming
Local code generation and compaction in optimizing microcode compilers

Local code generation and compaction in optimizing microcode compilers
A systolic array optimizing compiler

A systolic array optimizing compiler
Compaction-based parallelization

Compaction-based parallelization
Efficient static scheduling of loops on synchronous multiprocessors

Efficient static scheduling of loops on synchronous multiprocessors

Petri net versus modulo scheduling for software pipelining

Proceedings of the 28th annual international symposium on Microarchitecture
Heuristics for register-constrained software pipelining

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Determining the Order of Processor Transactions in StaticallyScheduled Multiprocessors

Journal of VLSI Signal Processing Systems
Circuit Retiming Applied to Decomposed Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Optimal Modulo Scheduling Through Enumeration

International Journal of Parallel Programming
Modulo Scheduling with Reduced Register Pressure

IEEE Transactions on Computers
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Improved spill code generation for software pipelined loops

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Loop Shifting for Loop Compaction

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Automated synthesis of pipelined designs on FPGAs for signal and image processing applications described in MATLAB

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Register pressure responsive software pipelining

Proceedings of the 2001 ACM symposium on Applied computing
Loop Transformations for Architectures with Partitioned Register Banks

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
On the Boosting of Instruction Scheduling by Renaming

The Journal of Supercomputing
Loop fusion for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
PACT HDL: a C compiler targeting ASICs and FPGAs with power and performance optimizations

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Validating software pipelining optimizations

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks

Journal of VLSI Signal Processing Systems
A Vectorizing Compiler for Multimedia Extensions

International Journal of Parallel Programming
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

IEEE Transactions on Computers
Run-Time Support to Register Allocation for Loop Parallelization of Image Processing Programs

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Efficient Pipelining of Nested Loops: Unroll-and-Squash

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Loop Shifting for Loop Compaction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Software Bubbles: Using Predication to Compensate for Aliasing in Software Pipelines

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Pipelined Java Virtual Machine Interpreters

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Software Pipelining of Nested Loops

CC '01 Proceedings of the 10th International Conference on Compiler Construction
A First Step Towards Time Optimal Software Pipelining of Loops with Control Flows

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Using Graph Models in Retargetable Optimizing Compilers for Microprocessors with VLIW Architectures

Cybernetics and Systems Analysis
Efficient 2D FFT implementation on mediaprocessors

Parallel Computing
An experimental evaluation of scalar replacement on scientific benchmarks

Software—Practice & Experience
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
SPOT: development tool for software pipeline optimization for VLIW-DSPs used in real-time image processing

Real-Time Imaging - Special issue on software engineering
An approach for integrating basic retiming and software pipelining

Proceedings of the 4th ACM international conference on Embedded software
Improving Data Locality by Array Contraction

IEEE Transactions on Computers
Spatial computation

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Automatic tiling of iterative stencil loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Time-Constrained Failure Diagnosis in Distributed Embedded Systems: Application to Actuator Diagnosis

IEEE Transactions on Parallel and Distributed Systems
Optimizing aggregate array computations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Cycle-Accurate ISS for a Dynamically Reconfigurable Processor Architecture

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Scheduling and optimal register placement for synchronous circuits derived using software pipelining techniques

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic loop pipelining in data-driven architectures

Proceedings of the 2nd conference on Computing frontiers
Tabu Search Algorithms for Cyclic Machine Scheduling Problems

Journal of Scheduling
Register allocation for software pipelined multi-dimensional loops

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
SAOL: The MPEG-4 Structured Audio Orchestra Language

Computer Music Journal
Specific optimization features in a C compiler for DSPs

Programming and Computing Software
On multiprocessor task scheduling using efficient state space search approaches

Journal of Parallel and Distributed Computing
Generic software pipelining at the assembly level

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
EPspectra: a formal toolkit for developing DSP software applications

Theory and Practice of Logic Programming
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
Loop pipelining for high-throughput stream computation using self-timed rings

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
The impact of loop unrolling on controller delay in high level synthesis

Proceedings of the conference on Design, automation and test in Europe
Software optimization of video codecs on pentium processor with MMX technology

EURASIP Journal on Applied Signal Processing
Pfelib: a performance primitives library for embedded vision

EURASIP Journal on Embedded Systems
A new strategy for multiprocessor scheduling of cyclic task graphs

International Journal of High Performance Computing and Networking
Automatic SIMD vectorization of chains of recurrences

Proceedings of the 22nd annual international conference on Supercomputing
Register allocation for software pipelined multidimensional loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing near-ML MIMO detector for SDR baseband on parallel programmable architectures

Proceedings of the conference on Design, automation and test in Europe
Generic multi-phase software-pipelined Partial-FFT on instruction-level-parallel architectures and SDR baseband applications

Proceedings of the conference on Design, automation and test in Europe
Integrated Modulo Scheduling for Clustered VLIW Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Resource aware mapping on coarse grained reconfigurable arrays

Microprocessors & Microsystems
Outer loop pipelining for application specific datapaths in FPGAs

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays

The Journal of Supercomputing
Modulo scheduling without overlapped lifetimes

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Generic multiphase software pipelined partial FFT on instruction level parallel architectures

IEEE Transactions on Signal Processing
Hardware/software partitioning and pipelined scheduling on runtime reconfigurable FPGAs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors

CC'07 Proceedings of the 16th international conference on Compiler construction
Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation

CC'07 Proceedings of the 16th international conference on Compiler construction
MIRS: modulo scheduling with integrated register spilling

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Towards a source level compiler: source level modulo scheduling

Program analysis and compilation, theory and practice
FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Introducing the semi-stencil algorithm

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Automatic memory partitioning and scheduling for throughput and power optimization

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Code transformations for embedded reconfigurable computing architectures

GTTSE'09 Proceedings of the 3rd international summer school conference on Generative and transformational techniques in software engineering III
Worst case analysis of decomposed software pipelining for cyclic unitary RCPSP with precedence delays

Journal of Scheduling
Designing fast architecture-sensitive tree search on modern multicore/many-core processors

ACM Transactions on Database Systems (TODS)
Combined ILP and register tiling: analytical model and optimization framework

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Control-Flow semantics for assembly-level data-flow graphs

RelMiCS'05 Proceedings of the 8th international conference on Relational Methods in Computer Science, Proceedings of the 3rd international conference on Applications of Kleene Algebra
Trace-Based runtime instruction rescheduling for architecture extension

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Program parallelization using synchronized pipelining

LOPSTR'09 Proceedings of the 19th international conference on Logic-Based Program Synthesis and Transformation
Increasing software-pipelined loops in the itanium-like architecture

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
SCAN: a heuristic for near-optimal software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Multi-dimensional kernel generation for loop nest software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms

Journal of Signal Processing Systems
Software pipelining support for transport triggered architecture processors

SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Using ownership to reason about inherent parallelism in object-oriented programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Batch-pipelining for multicore H.264 decoding

Journal of Visual Communication and Image Representation
Deadline constrained cyclic scheduling on pipelined dedicated processors considering multiprocessor tasks and changeover times

Mathematical and Computer Modelling: An International Journal
Full length article: Design of pre-processing algorithms for efficient MIMO-OFDM receiver architectures

Physical Communication
Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compositional approach applied to loop specialization

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Analysis of Inner-Loop Mapping onto Coarse-Grained Reconfigurable Architectures Using Hybrid Particle Swarm Optimization

International Journal of Organizational and Collective Intelligence
Code generation for an application-specific VLIW processor with clustered, addressable register files

Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
The resource-constrained modulo scheduling problem: an experimental study

Computational Optimization and Applications
Loop acceleration exploration for ASIP architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A survey of pipelined workflow scheduling: Models and algorithms

ACM Computing Surveys (CSUR)
Just-In-Time Software Pipelining

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
HEAP: A Highly Efficient Adaptive multi-Processor framework

Microprocessors & Microsystems
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Predicate-aware, makespan-preserving software pipelining of scheduling tables

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.02

Visualization

Abstract

Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism.Let {ABC}n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop {ABC}n is equivalent to A{BCA}n−1BC. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop.Various algorithms for software pipelining exist. A comparison of the alternative methods for software pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.