Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Highly concurrent scalar processing
Highly concurrent scalar processing
URPR—An extension of URCR for software pipelining
MICRO 19 Proceedings of the 19th annual workshop on Microprogramming
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data structures: form and function
Data structures: form and function
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
“Combining” as a compilation technique for VLIW architectures
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
Selected papers of the second workshop on Languages and compilers for parallel computing
Parallelization of loops with exits on pipelined architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A timed Petri-net model for fine-grain loop scheduling
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Architecture synthesis of high-performance application-specific processors
Architecture synthesis of high-performance application-specific processors
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Instruction scheduling using genetic algorithms
Instruction scheduling using genetic algorithms
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A dynamic-programming technique for compacting loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Cydra 5 minisupercomputer: architecture and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Journal of Supercomputing - Special issue on instruction-level parallelism
Avoidance and suppression of compensation code in a trace scheduling compiler
ACM Transactions on Programming Languages and Systems (TOPLAS)
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Specification of software pipelining using Petri nets
International Journal of Parallel Programming
Software pipelining: a comparison and improvement
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Using a lookahead window in a compaction-based parallelizing compiler
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Realistic scheduling: compaction for pipelined architectures
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A compilation technique for software pipelining of loops with conditional jumps
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
GURPR—a method for global software pipelining
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
An efficient search algorithm to find the elementary circuits of a graph
Communications of the ACM
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
IEEE Software
A Development Environment for Horizontal Microcode
IEEE Transactions on Software Engineering
Perfect Pipelining: A New Loop Parallelization Technique
ESOP '88 Proceedings of the 2nd European Symposium on Programming
Software pipelining: A Genetic Algorithm Approach
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Software Pipelining: Petri Net Pacemaker
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Architectural support for the efficient generation of code for horizontal architectures
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Global optimization of microprograms through modular control constructs
MICRO 12 Proceedings of the 12th annual workshop on Microprogramming
An approach to microprogram optimization considering resource occupancy and instruction formats
MICRO 10 Proceedings of the 10th annual workshop on Microprogramming
Local code generation and compaction in optimizing microcode compilers
Local code generation and compaction in optimizing microcode compilers
A systolic array optimizing compiler
A systolic array optimizing compiler
Compaction-based parallelization
Compaction-based parallelization
Efficient static scheduling of loops on synchronous multiprocessors
Efficient static scheduling of loops on synchronous multiprocessors
Petri net versus modulo scheduling for software pipelining
Proceedings of the 28th annual international symposium on Microarchitecture
Heuristics for register-constrained software pipelining
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Determining the Order of Processor Transactions in StaticallyScheduled Multiprocessors
Journal of VLSI Signal Processing Systems
Circuit Retiming Applied to Decomposed Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
Optimal Modulo Scheduling Through Enumeration
International Journal of Parallel Programming
Modulo Scheduling with Reduced Register Pressure
IEEE Transactions on Computers
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
Modulo scheduling for the TMS320C6x VLIW DSP architecture
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Improved spill code generation for software pipelined loops
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Loop Shifting for Loop Compaction
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Register pressure responsive software pipelining
Proceedings of the 2001 ACM symposium on Applied computing
Loop Transformations for Architectures with Partitioned Register Banks
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
IEEE Transactions on Computers
On the Boosting of Instruction Scheduling by Renaming
The Journal of Supercomputing
Loop fusion for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Modulo scheduling with integrated register spilling for clustered VLIW architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
PACT HDL: a C compiler targeting ASICs and FPGAs with power and performance optimizations
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Validating software pipelining optimizations
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks
Journal of VLSI Signal Processing Systems
A Vectorizing Compiler for Multimedia Extensions
International Journal of Parallel Programming
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling
IEEE Transactions on Computers
Run-Time Support to Register Allocation for Loop Parallelization of Image Processing Programs
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Efficient Pipelining of Nested Loops: Unroll-and-Squash
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Loop Shifting for Loop Compaction
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Software Bubbles: Using Predication to Compensate for Aliasing in Software Pipelines
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Pipelined Java Virtual Machine Interpreters
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Software Pipelining of Nested Loops
CC '01 Proceedings of the 10th International Conference on Compiler Construction
A First Step Towards Time Optimal Software Pipelining of Loops with Control Flows
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Reduced code size modulo scheduling in the absence of hardware support
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Split-Path Enhanced Pipeline Scheduling
IEEE Transactions on Parallel and Distributed Systems
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Using Graph Models in Retargetable Optimizing Compilers for Microprocessors with VLIW Architectures
Cybernetics and Systems Analysis
Efficient 2D FFT implementation on mediaprocessors
Parallel Computing
An experimental evaluation of scalar replacement on scientific benchmarks
Software—Practice & Experience
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Real-Time Imaging - Special issue on software engineering
An approach for integrating basic retiming and software pipelining
Proceedings of the 4th ACM international conference on Embedded software
Improving Data Locality by Array Contraction
IEEE Transactions on Computers
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
IEEE Transactions on Parallel and Distributed Systems
Optimizing aggregate array computations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
A Cycle-Accurate ISS for a Dynamically Reconfigurable Processor Architecture
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Dynamic loop pipelining in data-driven architectures
Proceedings of the 2nd conference on Computing frontiers
Tabu Search Algorithms for Cyclic Machine Scheduling Problems
Journal of Scheduling
Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
SAOL: The MPEG-4 Structured Audio Orchestra Language
Computer Music Journal
Specific optimization features in a C compiler for DSPs
Programming and Computing Software
On multiprocessor task scheduling using efficient state space search approaches
Journal of Parallel and Distributed Computing
Generic software pipelining at the assembly level
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
EPspectra: a formal toolkit for developing DSP software applications
Theory and Practice of Logic Programming
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Loop pipelining for high-throughput stream computation using self-timed rings
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
The impact of loop unrolling on controller delay in high level synthesis
Proceedings of the conference on Design, automation and test in Europe
Software optimization of video codecs on pentium processor with MMX technology
EURASIP Journal on Applied Signal Processing
Pfelib: a performance primitives library for embedded vision
EURASIP Journal on Embedded Systems
A new strategy for multiprocessor scheduling of cyclic task graphs
International Journal of High Performance Computing and Networking
Automatic SIMD vectorization of chains of recurrences
Proceedings of the 22nd annual international conference on Supercomputing
Register allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing near-ML MIMO detector for SDR baseband on parallel programmable architectures
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the conference on Design, automation and test in Europe
Integrated Modulo Scheduling for Clustered VLIW Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Resource aware mapping on coarse grained reconfigurable arrays
Microprocessors & Microsystems
Outer loop pipelining for application specific datapaths in FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
The Journal of Supercomputing
Modulo scheduling without overlapped lifetimes
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Generic multiphase software pipelined partial FFT on instruction level parallel architectures
IEEE Transactions on Signal Processing
Hardware/software partitioning and pipelined scheduling on runtime reconfigurable FPGAs
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors
CC'07 Proceedings of the 16th international conference on Compiler construction
CC'07 Proceedings of the 16th international conference on Compiler construction
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Towards a source level compiler: source level modulo scheduling
Program analysis and compilation, theory and practice
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Introducing the semi-stencil algorithm
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Automatic memory partitioning and scheduling for throughput and power optimization
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Code transformations for embedded reconfigurable computing architectures
GTTSE'09 Proceedings of the 3rd international summer school conference on Generative and transformational techniques in software engineering III
Designing fast architecture-sensitive tree search on modern multicore/many-core processors
ACM Transactions on Database Systems (TODS)
Combined ILP and register tiling: analytical model and optimization framework
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Control-Flow semantics for assembly-level data-flow graphs
RelMiCS'05 Proceedings of the 8th international conference on Relational Methods in Computer Science, Proceedings of the 3rd international conference on Applications of Kleene Algebra
Trace-Based runtime instruction rescheduling for architecture extension
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Program parallelization using synchronized pipelining
LOPSTR'09 Proceedings of the 19th international conference on Logic-Based Program Synthesis and Transformation
Increasing software-pipelined loops in the itanium-like architecture
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
SCAN: a heuristic for near-optimal software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Multi-dimensional kernel generation for loop nest software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms
Journal of Signal Processing Systems
Software pipelining support for transport triggered architecture processors
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Using ownership to reason about inherent parallelism in object-oriented programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Batch-pipelining for multicore H.264 decoding
Journal of Visual Communication and Image Representation
Mathematical and Computer Modelling: An International Journal
Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Compositional approach applied to loop specialization
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
International Journal of Organizational and Collective Intelligence
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
The resource-constrained modulo scheduling problem: an experimental study
Computational Optimization and Applications
Loop acceleration exploration for ASIP architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
Just-In-Time Software Pipelining
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
HEAP: A Highly Efficient Adaptive multi-Processor framework
Microprocessors & Microsystems
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture
Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Predicate-aware, makespan-preserving software pipelining of scheduling tables
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
Utilizing parallelism at the instruction level is an important way to improve performance. Because the time spent in loop execution dominates total execution time, a large body of optimizations focuses on decreasing the time to execute each iteration. Software pipelining is a technique that reforms the loop so that a faster execution rate is realized. Iterations are executed in overlapped fashion to increase parallelism.Let {ABC}n represent a loop containing operations A, B, C that is executed n times. Although the operations of a single iteration can be parallelized, more parallelism may be achieved if the entire loop is considered rather than a single iteration. The software pipelining transformation utilizes the fact that a loop {ABC}n is equivalent to A{BCA}n−1BC. Although the operations contained in the loop do not change, the operations are from different iterations of the original loop.Various algorithms for software pipelining exist. A comparison of the alternative methods for software pipelining is presented. The relationships between the methods are explored and possibilities for improvement highlighted.