Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Selected papers of the second workshop on Languages and compilers for parallel computing
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The floating point performance of a superscalar SPARC processor
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Circular scheduling: a new technique to perform software pipelining
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Realistic scheduling: compaction for pipelined architectures
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Data Flow and Dependence Analysis for Instruction Level Parallelism
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Pseudo vector processor based on register-windowed superscalar pipeline
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A novel framework of register allocation for software pipelining
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A scalar architecture for pseudo vector processing based on slide-windowed registers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimum register requirements for a modulo schedule
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
GURRR: a global unified resource requirements representation
IR '95 Papers from the 1995 ACM SIGPLAN workshop on Intermediate representations
Scheduling and mapping: software pipelining in the presence of structural hazards
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
Resource-Constrained Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
Optimum modulo schedules for minimum register requirements
ICS '95 Proceedings of the 9th international conference on Supercomputing
The meeting graph: a new model for loop cyclic register allocation
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Decomposed software pipelining with reduced register requirement
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Modulo scheduling with multiple initiation intervals
Proceedings of the 28th annual international symposium on Microarchitecture
Register allocation for predicated code
Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
A register file and scheduling model for application specific processor synthesis
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Heuristics for register-constrained software pipelining
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Framework for Resource-Constrained Rate-Optimal Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs
ICS '97 Proceedings of the 11th international conference on Supercomputing
CP-PACS: a massively parallel processor for large scale scientific calculations
ICS '97 Proceedings of the 11th international conference on Supercomputing
Compiler blockability of dense matrix factorizations
ACM Transactions on Mathematical Software (TOMS)
Optimal Modulo Scheduling Through Enumeration
International Journal of Parallel Programming
Modulo Scheduling with Reduced Register Pressure
IEEE Transactions on Computers
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
Effective cluster assignment for modulo scheduling
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Widening resources: a cost-effective technique for aggressive ILP architectures
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improved spill code generation for software pipelined loops
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A constraint driven approach to loop pipelining and register binding
Proceedings of the conference on Design, automation and test in Europe
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
Register pressure responsive software pipelining
Proceedings of the 2001 ACM symposium on Applied computing
IEEE Transactions on Computers
Evaluating the Use of Register Queues in Software Pipelined Loops
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
A comparative study of modulo scheduling techniques
ICS '02 Proceedings of the 16th international conference on Supercomputing
Modulo scheduling with integrated register spilling for clustered VLIW architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Constraint analysis for DSP code generation
Readings in hardware/software co-design
The Intel IA-64 Compiler Code Generator
IEEE Micro
Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Optimization for the Intel® Itanium® architecture register stack
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Non-Consistent Dual Register Files to Reduce Register Pressure
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The Architecture of Massively Parallel Processor CP-PACS
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Register allocation for optimal loop scheduling
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Probabilistic Predicate-Aware Modulo Scheduling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Register allocation for software pipelined multi-dimensional loops
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Differential register allocation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Demystifying on-the-fly spill code
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exploiting Vector Parallelism in Software Pipelined Loops
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
Allocating architected registers through differential encoding
ACM Transactions on Programming Languages and Systems (TOPLAS)
A unified evaluation framework for coarse grained reconfigurable array architectures
Proceedings of the 4th international conference on Computing frontiers
Using Transport Triggered Architectures for Embedded Processor Design
Integrated Computer-Aided Engineering
Facilitating compiler optimizations through the dynamic mapping of alternate register structures
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
On Periodic Register Need in Software Pipelining
IEEE Transactions on Computers
Latency-tolerant software pipelining in a production compiler
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Implementation of a Coarse-Grained Reconfigurable Media Processor for AVC Decoder
Journal of Signal Processing Systems
Rotating register allocation with multiple rotating branches
Proceedings of the 22nd annual international conference on Supercomputing
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Resource aware mapping on coarse grained reconfigurable arrays
Microprocessors & Microsystems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
The Journal of Supercomputing
On minimizing register usage of linearly scheduled algorithms with uniform dependencies
Computer Languages, Systems and Structures
CC'07 Proceedings of the 16th international conference on Compiler construction
Early control of register pressure for software pipelined loops
CC'03 Proceedings of the 12th international conference on Compiler construction
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
ACM Transactions on Embedded Computing Systems (TECS)
SIRALINA: efficient two-steps heuristic for storage optimisation in single period task scheduling
Journal of Combinatorial Optimization
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Increasing software-pipelined loops in the itanium-like architecture
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Using the meeting graph framework to minimise kernel loop unrolling for scheduled loops
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Allocating rotating registers by scheduling
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Predicate-aware, makespan-preserving software pipelining of scheduling tables
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Software pipelining is an important instruction scheduling technique for efficiently overlapping successive iterations of loops and executing them in parallel. This paper studies the task of register allocation for software pipelined loops, both with and without hardware features that are specifically aimed at supporting software pipelines. Register allocation for software pipelines presents certain novel problems leading to unconventional solutions, especially in the presence of hardware support. This paper formulates these novel problems and presents a number of alternative solution strategies. These alternatives are comprehensively tested against over one thousand loops to determine the best register allocation strategy, both with and without the hardware support for software pipelining.