Iterative modulo scheduling: an algorithm for software pipelining loops

Authors:
B. Ramakrishna Rau
Affiliations:
Hewlett-Packard Laboratories, 1501 Page Mill Road, Bldg. 3L, Palo Alto, CA
Venue:
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Year:
1994

Citing 30
Cited 205

Highly concurrent scalar processing

Highly concurrent scalar processing
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture

Selected papers of the second workshop on Languages and compilers for parallel computing
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Circular scheduling: a new technique to perform software pipelining

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
GURPR*: a new global software pipelining algorithm

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reverse If-Conversion

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
The Cydra 5 minisupercomputer: architecture and implementation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Loop optimization for horizontal microcoded machines

ICS '90 Proceedings of the 4th international conference on Supercomputing
A compilation technique for software pipelining of loops with conditional jumps

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
A comparison of list schedules for parallel processing systems

Communications of the ACM
An efficient search algorithm to find the elementary circuits of a graph

Communications of the ACM
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
Data Flow and Dependence Analysis for Instruction Level Parallelism

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Scheduling Loops on Parallel Processors: A Simple Algorithm with Close to Optimum Performance

CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
A Polynomial Time Method for Optimal Software Pipelining

CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
A technique of global optimization of microprograms

MICRO 11 Proceedings of the 11th annual workshop on Microprogramming

Scheduling and mapping: software pipelining in the presence of structural hazards

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software pipelining

ACM Computing Surveys (CSUR)
Optimum modulo schedules for minimum register requirements

ICS '95 Proceedings of the 9th international conference on Supercomputing
Petri net versus modulo scheduling for software pipelining

Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Region-based compilation: an introduction and motivation

Proceedings of the 28th annual international symposium on Microarchitecture
Register allocation for predicated code

Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A reduced multipipeline machine description that preserves scheduling constraints

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Speculative hedge: regulating compile-time speculation against profile variations

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Heuristics for register-constrained software pipelining

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Meld scheduling: relaxing scheduling constraints across region boundaries

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Optimization of machine descriptions for efficient use

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient formulation for optimal modulo schedulers

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs

ICS '97 Proceedings of the 11th international conference on Supercomputing
Cache sensitive modulo scheduling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A general algorithm for tiling the register level

ICS '98 Proceedings of the 12th international conference on Supercomputing
Resource widening versus replication: limits and performance-cost trade-off

ICS '98 Proceedings of the 12th international conference on Supercomputing
Optimal Modulo Scheduling Through Enumeration

International Journal of Parallel Programming
Modulo Scheduling with Reduced Register Pressure

IEEE Transactions on Computers
Reducing Data Hazards on Multi-pipelined DSP Architecture with Loop Scheduling

Journal of VLSI Signal Processing Systems - Special issue on future directions in the design and implementations of DSP systems
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Split-path enhanced pipeline scheduling for loops with control flows

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Widening resources: a cost-effective technique for aggressive ILP architectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MPS: Miss-Path Scheduling for Multiple-Issue Processors

IEEE Transactions on Computers
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time

IEEE Transactions on Computers
A recursive time estimation algorithm for program traces under resource constraints

SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Optimized unrolling of nested loops

Proceedings of the 14th international conference on Supercomputing
Improved spill code generation for software pipelined loops

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Properties of Rescheduling Size Invariance for Dynamic Rescheduling-Based VLIW Cross-Generation Compatibility

IEEE Transactions on Computers
Two-level hierarchical register file organization for VLIW processors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Register pressure responsive software pipelining

Proceedings of the 2001 ACM symposium on Applied computing
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching

Journal of VLSI Signal Processing Systems
Power-aware modulo scheduling for high-performance VLIW processors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Scheduling time-constrained instructions on pipelined processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop Transformations for Architectures with Partitioned Register Banks

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
ShiftQ: a bufferred interconnect for custom loop accelerators

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
Evaluating the Use of Register Queues in Software Pipelined Loops

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Loop fusion for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
A comparative study of modulo scheduling techniques

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Enhancing loop buffering of media and telecommunications applications using low-overhead predication

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Optimized Unrolling of Nested Loops

International Journal of Parallel Programming
Cycle-time aware architecture synthesis of custom hardware accelerators

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Code coverage and input variability: effects on architecture and compiler research

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
On achieving balanced power consumption in software pipelined loops

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
TimeC: A Time Constraint Language for ILP Processor Compilation

Constraints
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

International Journal of Parallel Programming
Handling Global Constraints in Compiler Strategy

International Journal of Parallel Programming
Computation in the Context of Transport Triggered Architectures

International Journal of Parallel Programming
Meld Scheduling: A Technique for Relaxing Scheduling Constraints

International Journal of Parallel Programming
Optimization of Machine Descriptions for Efficient Use

International Journal of Parallel Programming
Control Flow Regeneration for Software Pipelined Loops with Conditions

International Journal of Parallel Programming
The Intel IA-64 Compiler Code Generator

IEEE Micro
A finite state machine based format model of software pipelined loops with conditions

Progress in computer research
Loop Shifting for Loop Compaction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Optimizing Loop Performance for Clustered VLIW Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Pseudo-vectorizing Compiler for the SR8000 (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Influence of Variable Time Operations in Static Instruction Scheduling

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Global Software Pipelining with Iteration Preselection

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Software Pipelining of Nested Loops

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Predicate-aware scheduling: a technique for reducing resource constraints

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
Adapting instruction level parallelism for optimizing leakage in VLIW architectures

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Co-Scheduling Hardware and Software Pipelines

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
RTGEN: An Algorithm for Automatic Generation of Reservation Tables from Architectural Descriptions

Proceedings of the 12th international symposium on System synthesis
Register-Sensitive Software Pipelining

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An Enhanced Co-Scheduling Method using Reduced MS-State Diagrams

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Code size reduction technique and implementation for software-pipelined DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
RTGEN: an algorithm for automatic generation of reservation tables from architectural descriptions

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications

IEEE Transactions on Computers
An experimental evaluation of scalar replacement on scientific benchmarks

Software—Practice & Experience
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Probabilistic Predicate-Aware Modulo Scheduling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A loop accelerator for low power embedded VLIW processors

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
xDSPcore: A Compiler-Based Configurable Digital Signal Processor

IEEE Micro
Compiler-Directed ILP Extraction for Clustered VLIW/EPIC Machines: Predication, Speculation and Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Dynamic loop pipelining in data-driven architectures

Proceedings of the 2nd conference on Computing frontiers
Demystifying on-the-fly spill code

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic multithreading and multiprocessing of C programs for IXP

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Combining module selection and resource sharing for efficient FPGA pipeline synthesis

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Register aware scheduling for distributed cache clustered architecture

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Specific optimization features in a C compiler for DSPs

Programming and Computing Software
Compiler transformations for effectively exploiting a zero overhead loop buffer

Software—Practice & Experience
Generic software pipelining at the assembly level

SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Increasing hardware efficiency with multifunction loop accelerators

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Merging Head and Tail Duplication for Convergent Hyperblock Formation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
Hybrid multi-core architecture for boosting single-threaded performance

ACM SIGARCH Computer Architecture News
A unified evaluation framework for coarse grained reconfigurable array architectures

Proceedings of the 4th international conference on Computing frontiers
Trident: From High-Level Language to Hardware Circuitry

Computer
Partitioning and scheduling DSP applications with maximal memory access hiding

EURASIP Journal on Applied Signal Processing
Efficient implementation of nested-loop multimedia algorithms

EURASIP Journal on Applied Signal Processing
Non-transparent debugging for software-pipelined loops

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Application driven embedded system design: a face recognition case study

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Hierarchical coarse-grained stream compilation for software defined radio

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Facilitating compiler optimizations through the dynamic mapping of alternate register structures

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Code-size conscious pipelining of imperfectly nested loops

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Latency-tolerant software pipelining in a production compiler

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Optimal vs. heuristic integrated code generation for clustered VLIW architectures

SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Optimizing code through iterative specialization

Proceedings of the 2008 ACM symposium on Applied computing
Dynamic configuration of application-specific implicit instructions for embedded pipelined processors

Proceedings of the 2008 ACM symposium on Applied computing
Optimized mapping for enchancing the operation parallelism in coarse-grained reconfigurable arrays

SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
Rotating register allocation with multiple rotating branches

Proceedings of the 22nd annual international conference on Supercomputing
Orchestrating the execution of stream programs on multicore platforms

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Register allocation for software pipelined multidimensional loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
DVFS in loop accelerators using BLADES

Proceedings of the 45th annual Design Automation Conference
Programming Reconfigurable Decoupled Application Control Accelerator for Mobile Systems

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Timing optimization via nest-loop pipelining considering code size

Microprocessors & Microsystems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture

Languages and Compilers for Parallel Computing
Coordinated concurrent memory accesses on a reconfigurable multimedia accelerator

Microprocessors & Microsystems
Integrated Modulo Scheduling for Clustered VLIW Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Software Pipelining in Nested Loops with Prolog-Epilog Merging

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
SPR: an architecture-adaptive CGRA mapping tool

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Resource aware mapping on coarse grained reconfigurable arrays

Microprocessors & Microsystems
Design and implementation of a queue compiler

Microprocessors & Microsystems
Improving performance of simple cores by exploiting loop-level parallelism through value prediction and reconfiguration

Proceedings of the 6th ACM conference on Computing frontiers
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays

The Journal of Supercomputing
Modulo scheduling without overlapped lifetimes

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Synergistic execution of stream programs on multicores with accelerators

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Stream Compilation for Real-Time Embedded Multicore Systems

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
CGRA express: accelerating execution using dynamic operation fusion

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Scheduling with soft constraints

Proceedings of the 2009 International Conference on Computer-Aided Design
Automatic memory partitioning and scheduling for throughput and power optimization

Proceedings of the 2009 International Conference on Computer-Aided Design
Input-driven dynamic execution prediction of streaming applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Compiling for reconfigurable computing: A survey

ACM Computing Surveys (CSUR)
Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation

CC'07 Proceedings of the 16th international conference on Compiler construction
Genetic programming applied to compiler heuristic optimization

EuroGP'03 Proceedings of the 6th European conference on Genetic programming
The implementation of a coarse-grained reconfigurable architecture with loop self-pipelining

ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
MIRS: modulo scheduling with integrated register spilling

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Minimizing communication in rate-optimal software pipelining for stream programs

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
An algorithm to improve parallelism in distributed systems using asynchronous calls

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Towards a source level compiler: source level modulo scheduling

Program analysis and compilation, theory and practice
Buffer-space efficient and deadlock-free scheduling of stream applications on multi-core architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
MEDICS: ultra-portable processing for medical image reconstruction

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Reuse-aware modulo scheduling for stream processors

Proceedings of the Conference on Design, Automation and Test in Europe
Resource recycling: putting idle resources to work on a composable accelerator

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Fine-grain dynamic instruction placement for L0 scratch-pad memory

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Compilers, architectures and synthesis for embedded computing: retrospect and prospect

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A design scheme for a reconfigurable accelerator implemented by single-flux quantum circuits

Journal of Systems Architecture: the EUROMICRO Journal
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Precedence constraint posting for cyclic scheduling problems

CPAIOR'11 Proceedings of the 8th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

The Journal of Supercomputing
Worst case analysis of decomposed software pipelining for cyclic unitary RCPSP with precedence delays

Journal of Scheduling
An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
A constraint based approach to cyclic RCPSP

CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
Efficient Spilling Reduction for Software Pipelined Loops in Presence of Multiple Register Types in Embedded VLIW Processors

ACM Transactions on Embedded Computing Systems (TECS)
Register pressure in software-pipelined loop nests: fast computation and impact on architecture design

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Combined ILP and register tiling: analytical model and optimization framework

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Improving performance of nested loops on reconfigurable array processors

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

The Journal of Supercomputing
SCAN: a heuristic for near-optimal software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Multi-dimensional kernel generation for loop nest software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Dataflow-driven execution control in a coarse-grained reconfigurable array (abstract only)

Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
EPIMap: using epimorphism to map applications on CGRAs

Proceedings of the 49th Annual Design Automation Conference
Deadline constrained cyclic scheduling on pipelined dedicated processors considering multiprocessor tasks and changeover times

Mathematical and Computer Modelling: An International Journal
Global cyclic cumulative constraint

CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Function inlining and loop unrolling for loop acceleration in reconfigurable processors

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Compositional approach applied to loop specialization

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
High performance FFT on SGI Altix 3700

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Execution Time Optimization Using Delayed Multidimensional Retiming

DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications
Memory partitioning and scheduling co-optimization in behavioral synthesis

Proceedings of the International Conference on Computer-Aided Design
The resource-constrained modulo scheduling problem: an experimental study

Computational Optimization and Applications
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
REGIMap: register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs)

Proceedings of the 50th Annual Design Automation Conference
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors

ACM Transactions on Embedded Computing Systems (TECS)
The benefits of using variable-length pipelined operations in high-level synthesis

ACM Transactions on Embedded Computing Systems (TECS)
Just-In-Time Software Pipelining

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Evaluator-executor transformation for efficient pipelining of loops with conditionals

ACM Transactions on Architecture and Code Optimization (TACO)
SDC-based modulo scheduling for pipeline synthesis

Proceedings of the International Conference on Computer-Aided Design
CROSS cyclic resource-constrained scheduling solver

Artificial Intelligence
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture

Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems

Quantified Score

Hi-index	0.02

Visualization

Abstract

Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.