Lifetime-sensitive modulo scheduling

Authors:
Richard A. Huff
Affiliations:
Cornell Univ., Ithaca, NY
Venue:
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Year:
1993

Citing 22
Cited 96

Efficient instruction scheduling for a pipelined architecture

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Code scheduling and register allocation in large basic blocks

ICS '88 Proceedings of the 2nd international conference on Supercomputing
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
An efficient method of computing static single assignment form

POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Incremental foresighted local compaction

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Integrating register allocation and instruction scheduling for RISCs

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
Local Microcode Compaction Techniques

ACM Computing Surveys (CSUR)
An efficient search algorithm to find the elementary circuits of a graph

Communications of the ACM
A Systolic Array Optimizing Compiler

A Systolic Array Optimizing Compiler
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Data Flow and Dependence Analysis for Instruction Level Parallelism

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming

Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimum register requirements for a modulo schedule

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining with register allocation and spilling

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Scheduling and mapping: software pipelining in the presence of structural hazards

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software pipelining

ACM Computing Surveys (CSUR)
Resource-Constrained Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Optimum modulo schedules for minimum register requirements

ICS '95 Proceedings of the 9th international conference on Supercomputing
The meeting graph: a new model for loop cyclic register allocation

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Allocating registers in multiple instruction-issuing processors

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Register allocation for predicated code

Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A reduced multipipeline machine description that preserves scheduling constraints

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Heuristics for register-constrained software pipelining

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Framework for Resource-Constrained Rate-Optimal Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Efficient formulation for optimal modulo schedulers

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs

ICS '97 Proceedings of the 11th international conference on Supercomputing
Cache sensitive modulo scheduling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Optimal Modulo Scheduling Through Enumeration

International Journal of Parallel Programming
Modulo Scheduling with Reduced Register Pressure

IEEE Transactions on Computers
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Resource usage models for instruction scheduling: two new models and a classification

ICS '99 Proceedings of the 13th international conference on Supercomputing
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Improved spill code generation for software pipelined loops

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Loop Shifting for Loop Compaction

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
RS-FDRA: a register sensitive software pipelining algorithm for embedded VLIW processors

Proceedings of the ninth international symposium on Hardware/software codesign
Register pressure responsive software pipelining

Proceedings of the 2001 ACM symposium on Applied computing
Evaluating the Use of Register Queues in Software Pipelined Loops

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
A comparative study of modulo scheduling techniques

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
On achieving balanced power consumption in software pipelined loops

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Validating software pipelining optimizations

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

International Journal of Parallel Programming
Loop Shifting for Loop Compaction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Compiler-Directed Reordering of Data by Cyclic Graph Coloring

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
An Introduction to Simplex Scheduling

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Co-Scheduling Hardware and Software Pipelines

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Register-Sensitive Software Pipelining

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An Enhanced Co-Scheduling Method using Reduced MS-State Diagrams

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Code size reduction technique and implementation for software-pipelined DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
An experimental evaluation of scalar replacement on scientific benchmarks

Software—Practice & Experience
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Efficient instruction scheduling for a pipelined architecture

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Register allocation for software pipelined multi-dimensional loops

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Complementing software pipelining with software thread integration

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
A spill code minimization technique: application in the metrowerks starcore C compiler

International Journal of Parallel Programming
Compiler transformations for effectively exploiting a zero overhead loop buffer

Software—Practice & Experience
Bypass aware instruction scheduling for register file power reduction

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Generic software pipelining at the assembly level

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
On Periodic Register Need in Software Pipelining

IEEE Transactions on Computers
Algorithms and analysis of scheduling for loops with minimum switching

International Journal of Computational Science and Engineering
Post-pass periodic register allocation to minimise loop unrolling degree

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Register allocation for software pipelined multidimensional loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Integrated Modulo Scheduling for Clustered VLIW Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Modulo scheduling without overlapped lifetimes

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Energy-Aware Loop Scheduling and Assignment for Multi-Core, Multi-Functional-Unit Architecture

Journal of Signal Processing Systems
A simple, verified validator for software pipelining

Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
On minimizing register usage of linearly scheduled algorithms with uniform dependencies

Computer Languages, Systems and Structures
Preprocessing strategy for effective modulo scheduling on multi-issue digital signal processors

CC'07 Proceedings of the 16th international conference on Compiler construction
Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation

CC'07 Proceedings of the 16th international conference on Compiler construction
Early control of register pressure for software pipelined loops

CC'03 Proceedings of the 12th international conference on Compiler construction
MIRS: modulo scheduling with integrated register spilling

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
An algorithm to improve parallelism in distributed systems using asynchronous calls

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Worst case analysis of decomposed software pipelining for cyclic unitary RCPSP with precedence delays

Journal of Scheduling
Register pressure in software-pipelined loop nests: fast computation and impact on architecture design

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Instruction re-selection for iterative modulo scheduling on high performance multi-issue DSPs

EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Increasing software-pipelined loops in the itanium-like architecture

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Multi-dimensional kernel generation for loop nest software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
REGIMap: register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs)

Proceedings of the 50th Annual Design Automation Conference
SDC-based modulo scheduling for pipeline synthesis

Proceedings of the International Conference on Computer-Aided Design
CROSS cyclic resource-constrained scheduling solver

Artificial Intelligence
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Predicate-aware, makespan-preserving software pipelining of scheduling tables

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results—when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure—indicate near-optimal performance.