Modulo scheduling of loops in control-intensive non-numeric programs

Authors:
Daniel M. Lavery;Wen-mei W. Hwu
Affiliations:
Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL;Center for Reliable and High-Performance Computing, University of Illinois, Urbana-Champaign, IL
Venue:
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Year:
1996

Citing 18
Cited 16

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture

Selected papers of the second workshop on Languages and compilers for parallel computing
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Advances in languages and compilers for parallel processing

Advances in languages and compilers for parallel processing
Using profile information to assist classic code optimizations

Software—Practice & Experience
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Efficient scheduling of fine grain parallelism in loops

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Superblock formation using static program analysis

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming

Interprocedural conditional branch elimination

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Wavesched: a novel scheduling technique for control-flow intensive behavioral descriptions

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Evaluation of scheduling techniques on a SPARC-based VLIW testbed

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Optimizing Overall Loop Schedules Using Prefetching and Partitioning

IEEE Transactions on Parallel and Distributed Systems
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

IEEE Transactions on Computers
Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
VHC: Quickly Building an Optimizer for Complex Embedded Architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compiler-Directed ILP Extraction for Clustered VLIW/EPIC Machines: Predication, Speculation and Modulo Scheduling

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Complementing software pipelining with software thread integration

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Exploiting Vector Parallelism in Software Pipelined Loops

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Latency-tolerant software pipelining in a production compiler

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture

Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Much of the previous work on modulo scheduling has targeted numeric programs, in which, often, the majority of the loops are well-behaved loop-counter-based loops without early exits. In control-intensive non-numeric programs, the loops frequently have characteristics that make it more difficult to effectively apply modulo scheduling. These characteristics include multiple control flow paths, loops that are not based on a loop counter, and multiple exits. In these loops, the presence of unimportant paths with high resource usage or long dependence chains can penalize the important paths. A path that contains a hazard such as another nested loop can prohibit modulo scheduling of the loop. Control dependences can severely restrict the overlap of the blocks within and across iterations. This paper describes a set of methods that allow effective modulo scheduling of loops with multiple exits. The techniques include removal of control dependences to enable speculation, extensions to modulo variable expansion, and a new epilogue generation scheme. These methods can be used with superblock and hyperblock techniques to allow modulo scheduling of the selected paths of loops with arbitrary control flow. A case study is presented to show how these methods, combined with superblock techniques, enable modulo scheduling to be effectively applied to control-intensive non-numeric programs. Performance results for several SPEC CINT92 benchmarks and Unix utility programs are reported and demonstrate the applicability of modulo scheduling to this class of programs.