Swing Modulo Scheduling: A Lifetime-Sensitive Approach

Authors:
Josep Llosa
Affiliations:
-
Venue:
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Year:
1996

Citing 0
Cited 52

Cache sensitive modulo scheduling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Loop Shifting for Loop Compaction

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
RS-FDRA: a register sensitive software pipelining algorithm for embedded VLIW processors

Proceedings of the ninth international symposium on Hardware/software codesign
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
A comparative study of modulo scheduling techniques

ICS '02 Proceedings of the 16th international conference on Supercomputing
An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

International Journal of Parallel Programming
Loop Shifting for Loop Compaction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Instruction Replication for Clustered Microarchitectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Removing communications in clustered microarchitectures through instruction replication

ACM Transactions on Architecture and Code Optimization (TACO)
Demystifying on-the-fly spill code

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Contributions to the GNU compiler collection

IBM Systems Journal
Variable-Based Multi-module Data Caches for Clustered VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
A spill code minimization technique: application in the metrowerks starcore C compiler

International Journal of Parallel Programming
In search of a program generator to implement generic transformations for high-performance computing

Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Heterogeneous Clustered VLIW Microarchitectures

Proceedings of the International Symposium on Code Generation and Optimization
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Critical Block Scheduling: A Thread-Level Parallelizing Mechanism for a Heterogeneous Chip Multiprocessor Architecture

Languages and Compilers for Parallel Computing
Integrated Modulo Scheduling for Clustered VLIW Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Modulo scheduling without overlapped lifetimes

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A simple, verified validator for software pipelining

Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
MIRS: modulo scheduling with integrated register spilling

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Static speculation as post-link optimization for the Grid Alu processor

Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Worst case analysis of decomposed software pipelining for cyclic unitary RCPSP with precedence delays

Journal of Scheduling
An energy-efficient patchable accelerator for post-silicon engineering changes

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A constraint based approach to cyclic RCPSP

CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
Register pressure in software-pipelined loop nests: fast computation and impact on architecture design

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors

The Journal of Supercomputing
Massively parallel programming models used as hardware description languages: the OpenCL case

Proceedings of the International Conference on Computer-Aided Design
Multi-dimensional kernel generation for loop nest software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
Global cyclic cumulative constraint

CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Analysis of Inner-Loop Mapping onto Coarse-Grained Reconfigurable Architectures Using Hybrid Particle Swarm Optimization

International Journal of Organizational and Collective Intelligence
A pattern-supported parallelization approach

Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores
The benefits of using variable-length pipelined operations in high-level synthesis

ACM Transactions on Embedded Computing Systems (TECS)
CROSS cyclic resource-constrained scheduling solver

Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a novel software pipelining approach, which is called Swing Modulo Scheduling (SMS). It generates schedules that are near optimal in terms of initiation interval, register requirements and stage count. Swing Modulo Scheduling is an heuristic approach that has a low computational cost. The paper describes the technique and evaluates it for the Perfect Club benchmark suite. SMS is compared with other heuristic methods showing that it outperforms them in terms of the quality of the obtained schedules and compilation time. SMS is also compared with an integer linear programming approach that generates optimum schedules but with a huge computational cost, which makes it feasible only for very small loops. For a set of small loops, SMS obtained the optimum initiation interval in all the cases and its schedules required only 5% more registers and a 1% higher stage count than the optimum.