Parallelization of loops with exits on pipelined architectures

Authors:
P. Tirumalai;M. Lee;M. Schlansker
Affiliations:
Hewlett-Packard Laboratories, Mailstop 3u-7, 1501 Page Mill Road, Palo Alto, CA;Hewlett-Packard Laboratories, Mailstop 3u-7, 1501 Page Mill Road, Palo Alto, CA;Hewlett-Packard Laboratories, Mailstop 3u-7, 1501 Page Mill Road, Palo Alto, CA
Venue:
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Year:
1990

Citing 12
Cited 31

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Numerical recipes: the art of scientific computing

Numerical recipes: the art of scientific computing
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
A Fortran compiler for the FPS-164 scientific computer

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Control and data dependence for program transformations.

Control and data dependence for program transformations.

Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimum register requirements for a modulo schedule

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
Optimum modulo schedules for minimum register requirements

ICS '95 Proceedings of the 9th international conference on Supercomputing
Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Critical path reduction for scalar programs

Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Region-based compilation: an introduction and motivation

Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Heuristics for register-constrained software pipelining

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Modulo Scheduling with Reduced Register Pressure

IEEE Transactions on Computers
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Control CPR: a branch height reduction optimization for EPIC architectures

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
The Intel IA-64 Compiler Code Generator

IEEE Micro
Architecture and Performance of the Hitachi SR2201 Massively Parallel Processor System

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Modulo scheduling theory can be applied successfully to overlap Fortran DO loops on pipelined computers issuing multiple operations per cycle both with and without special loop architectural support [1, 2, 3]. This paper shows that a broader class of loops - repeat-until, while, and loops with more than one exit - where the trip count is not known beforehand, can also be overlapped efficiently on multiple issue pipelined machines. Special features that are required in the architecture, as well as compiler representations for accelerating these loop constructs, are discussed. Performance results are presented for a few select examples. A prototype scheduler is currently under construction at HP Laboratories.