Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

Authors:
B. R. Rau;C. D. Glaeser
Affiliations:
-;-
Venue:
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Year:
1981

Citing 9
Cited 156

Deterministic Processor Scheduling

ACM Computing Surveys (CSUR)
Local Microcode Compaction Techniques

ACM Computing Surveys (CSUR)
A comparison of list schedules for parallel processing systems

Communications of the ACM
Flow Analysis of Computer Programs

Flow Analysis of Computer Programs
Processor-memory interconnections for multiprocessors

ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
An approach to microprogram optimization considering resource occupancy and instruction formats

MICRO 10 Proceedings of the 10th annual workshop on Microprogramming
Improving the throughput of a pipeline by insertion of delays

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
A technique of global optimization of microprograms

MICRO 11 Proceedings of the 11th annual workshop on Microprogramming
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)

Principles of Compiler Design (Addison-Wesley series in computer science and information processing)

Compilation for a high-performance systolic array

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Warp architecture and implementation

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The warp computer: Architecture, implementation, and performance

IEEE Transactions on Computers
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Optimizing horizontal microprograms for vectorial loops with timed petri nets

ICS '88 Proceedings of the 2nd international conference on Supercomputing
A compilation technique for software pipelining of loops with conditional jumps

ACM SIGMICRO Newsletter
Operation scheduling in reconfigurable, multifunction pipelines

ACM SIGMICRO Newsletter
Overlapped loop support in the Cydra 5

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Compiler optimizations for asynchronous systolic array programs

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
On reordering instruction streams for pipelined computers

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Parallelization of loops with exits on pipelined architectures

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A timed Petri-net model for fine-grain loop scheduling

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Distributed Instruction Set Computer Architecture

IEEE Transactions on Computers
Register allocation for software pipelined loops

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Sentinel scheduling for VLIW and superscalar processors

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Controlling and sequencing a heavily pipelined floating-point operator

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
StaCS: a Static Control Superscalar architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reverse If-Conversion

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A novel framework of register allocation for software pipelining

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
Height reduction of control recurrences for ILP processors

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining with register allocation and spilling

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Instruction scheduling in the TOBEY compiler

IBM Journal of Research and Development
Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives

IEEE Transactions on Parallel and Distributed Systems
Scheduling and mapping: software pipelining in the presence of structural hazards

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Software pipelining

ACM Computing Surveys (CSUR)
Resource-Constrained Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Compiling and optimizing for decoupled architectures

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Region-based compilation: an introduction and motivation

Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Valid Transformations: A New Class of Loop Transformations for High-Level Synthesis and Pipelined Scheduling Applications

IEEE Transactions on Parallel and Distributed Systems
Software pipelining showdown: optimal vs. heuristic methods in a production compiler

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Heuristics for register-constrained software pipelining

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Instruction scheduling for the HP PA-8000

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Meld scheduling: relaxing scheduling constraints across region boundaries

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Framework for Resource-Constrained Rate-Optimal Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Loop optimization for horizontal microcoded machines

ICS '90 Proceedings of the 4th international conference on Supercomputing
Towards efficient fine-grain software pipelining

ICS '90 Proceedings of the 4th international conference on Supercomputing
A VLIW architecture based on shifting register files

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A compilation technique for software pipelining of loops with conditional jumps

MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Interprocedural conditional branch elimination

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Efficient formulation for optimal modulo schedulers

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs

ICS '97 Proceedings of the 11th international conference on Supercomputing
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
Circuit Retiming Applied to Decomposed Software Pipelining

IEEE Transactions on Parallel and Distributed Systems
Resource widening versus replication: limits and performance-cost trade-off

ICS '98 Proceedings of the 12th international conference on Supercomputing
Optimal Modulo Scheduling Through Enumeration

International Journal of Parallel Programming
Modulo Scheduling with Reduced Register Pressure

IEEE Transactions on Computers
Warp architecture and implementation

25 years of the international symposia on Computer architecture (selected papers)
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
Split-path enhanced pipeline scheduling for loops with control flows

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Widening resources: a cost-effective technique for aggressive ILP architectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling for the TMS320C6x VLIW DSP architecture

Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Overview of a high-performance programmable pipeline structure

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Optimizations and oracle parallelism with dynamic translation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time

IEEE Transactions on Computers
Unroll-based register coalescing

Proceedings of the 14th international conference on Supercomputing
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Improved spill code generation for software pipelined loops

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Loop Shifting for Loop Compaction

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Properties of Rescheduling Size Invariance for Dynamic Rescheduling-Based VLIW Cross-Generation Compatibility

IEEE Transactions on Computers
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Scheduling time-constrained instructions on pipelined processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software Pipelining Irregular Loops On the TMS320C6000 VLIW DSP Architecture

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Static resource models of instruction sets

Proceedings of the 14th international symposium on Systems synthesis
FDRA: a software-pipelining algorithm for embedded VLIW processors

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
A comparative study of modulo scheduling techniques

ICS '02 Proceedings of the 16th international conference on Supercomputing
Affinity-based cluster assignment for unrolled loops

ICS '02 Proceedings of the 16th international conference on Supercomputing
Modulo schedule buffers

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Constraint analysis for DSP code generation

Readings in hardware/software co-design
Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks

Journal of VLSI Signal Processing Systems
Hardware-Software partitioning and pipelined scheduling of transformative applications

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory

International Journal of Parallel Programming
Handling Global Constraints in Compiler Strategy

International Journal of Parallel Programming
A Vectorizing Compiler for Multimedia Extensions

International Journal of Parallel Programming
Meld Scheduling: A Technique for Relaxing Scheduling Constraints

International Journal of Parallel Programming
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

IEEE Transactions on Computers
Generalized Multiway Branch Unit for VLIW Microprocessors

IEEE Transactions on Parallel and Distributed Systems
Static resource models for code-size efficient embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
Copy Elimination for Parallelizing Compilers

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Loop Shifting for Loop Compaction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Balancing Fine- and Medium-Grained Parallelism in Scheduling Loops for the XIMD Architecture

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Software Pipelining of Nested Loops

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Reduced code size modulo scheduling in the absence of hardware support

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
Efficient code generation for horizontal architectures: Compiler techniques and architectural support

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Architectural support for the efficient generation of code for horizontal architectures

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Co-Scheduling Hardware and Software Pipelines

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Partitioned Schedules for Clustered VLIW Architectures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Mapping of generalized template matching onto reconfigurable computers

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
Code size reduction technique and implementation for software-pipelined DSP applications

ACM Transactions on Embedded Computing Systems (TECS)
Instruction Replication for Clustered Microarchitectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A timed Petri-net model for fine-grain loop scheduling

CASCON '91 Proceedings of the 1991 conference of the Centre for Advanced Studies on Collaborative research
Register allocation for optimal loop scheduling

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Loop Shifting and Compaction for the High-Level Synthesis of Designs with Complex Control Flow

Proceedings of the conference on Design, automation and test in Europe - Volume 1
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Removing communications in clustered microarchitectures through instruction replication

ACM Transactions on Architecture and Code Optimization (TACO)
Register aware scheduling for distributed cache clustered architecture

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Tight analysis of the performance potential of thread speculation using spec CPU 2006

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling of Iterative Algorithms with Matrix Operations for Efficient FPGA Design--Implementation of Finite Interval Constant Modulus Algorithm

Journal of VLSI Signal Processing Systems
Heterogeneous Clustered VLIW Microarchitectures

Proceedings of the International Symposium on Code Generation and Optimization
High-performance and low-power VLIW cores for numerical computations

International Journal of High Performance Computing and Networking
Rotating register allocation with multiple rotating branches

Proceedings of the 22nd annual international conference on Supercomputing
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays

Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Timing optimization via nest-loop pipelining considering code size

Microprocessors & Microsystems
Slack analysis in the system design loop

CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
On the exploitation of loop-level parallelism in embedded applications

ACM Transactions on Embedded Computing Systems (TECS)
Integrated Modulo Scheduling for Clustered VLIW Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Improving performance of simple cores by exploiting loop-level parallelism through value prediction and reconfiguration

Proceedings of the 6th ACM conference on Computing frontiers
Modulo scheduling without overlapped lifetimes

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
On minimizing register usage of linearly scheduled algorithms with uniform dependencies

Computer Languages, Systems and Structures
Register allocation and optimal spill code scheduling in software pipelined loops using 0-1 integer linear programming formulation

CC'07 Proceedings of the 16th international conference on Compiler construction
MIRS: modulo scheduling with integrated register spilling

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Towards a source level compiler: source level modulo scheduling

Program analysis and compilation, theory and practice
Translation validation of loop optimizations and software pipelining in the TVOC framework: in memory of Amir Pnueli

SAS'10 Proceedings of the 17th international conference on Static analysis
Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A cyclic scheduling problem with an undetermined number of parallel identical processors

Computational Optimization and Applications
How many threads to spawn during program multithreading?

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Hardware support for multithreaded execution of loops with limited parallelism

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
SCAN: a heuristic for near-optimal software pipelining

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
Deadline constrained cyclic scheduling on pipelined dedicated processors considering multiprocessor tasks and changeover times

Mathematical and Computer Modelling: An International Journal
Software thread integration for instruction-level parallelism

ACM Transactions on Embedded Computing Systems (TECS)
Allocating rotating registers by scheduling

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SDC-based modulo scheduling for pipeline synthesis

Proceedings of the International Conference on Computer-Aided Design
Predicate-aware, makespan-preserving software pipelining of scheduling tables

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.03

Visualization

Abstract

Horizontal architectures are attractive for cost-effective, high performance scientific computing. They are, however, very difficult to schedule. Consequently, it is difficult to develop compilers that can generate efficient code for such architectures. The polycyclic architecture has been developed specifically to make the task of scheduling easy. As a result, it has been possible to develop a powerful scheduling algorithm that yields optimal and near-optimal schedules for iterative computations. This novel architecture and this scheduling algorithm are the topic of this paper.