Register requirements of pipelined processors

Authors:
William Mangione-Smith;Santosh G. Abraham;Edward S. Davidson
Affiliations:
-;-;-
Venue:
ICS '92 Proceedings of the 6th international conference on Supercomputing
Year:
1992

Citing 14
Cited 22

Highly concurrent scalar processing

Highly concurrent scalar processing
HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
A VLIW architecture for a trace scheduling compiler

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Compile time optimization of memory and register usage on the Cray 2

Selected papers of the second workshop on Languages and compilers for parallel computing
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1

Computer - Special issue on experimental research in computer architecture
Performance bounds and buffer space requirements for concurrent processors

Performance bounds and buffer space requirements for concurrent processors
Vector register design for polycyclic vector scheduling

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Decoupled access/execute computer architectures

ACM Transactions on Computer Systems (TOCS)
A Fortran compiler for the FPS-164 scientific computer

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Efficient code generation for horizontal architectures: Compiler techniques and architectural support

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Improving the throughput of a pipeline by insertion of delays

ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
A systolic array optimizing compiler

A systolic array optimizing compiler

Minimum register requirements for a modulo schedule

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Optimum modulo schedules for minimum register requirements

ICS '95 Proceedings of the 9th international conference on Supercomputing
Register allocation for predicated code

Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling

Proceedings of the 28th annual international symposium on Microarchitecture
Heuristics for register-constrained software pipelining

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs

ICS '97 Proceedings of the 11th international conference on Supercomputing
Modulo Scheduling with Reduced Register Pressure

IEEE Transactions on Computers
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improved spill code generation for software pipelined loops

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Communication scheduling

ACM SIGPLAN Notices
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching

Journal of VLSI Signal Processing Systems
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Evaluating the Use of Register Queues in Software Pipelined Loops

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Non-Consistent Dual Register Files to Reduce Register Pressure

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Register Constrained Modulo Scheduling

IEEE Transactions on Parallel and Distributed Systems
Partitioning and scheduling DSP applications with maximal memory access hiding

EURASIP Journal on Applied Signal Processing
On Periodic Register Need in Software Pipelining

IEEE Transactions on Computers
MIRS: modulo scheduling with integrated register spilling

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

To enable concurrent instruction execution, scientific computers generally rely on pipelining, which combines with faster system clocks to achieve greater throughput. Each concurrently executing instruction requires buffer space, usually implemented as a register, to receive its result. This paper focuses on the issue of how many registers are required to achieve optimal performance in pipelined scientific computers. Four machine models are considered: single, double, and triple issue scalar machines, and vector machines with various register lengths. A model is presented that accurately relates the register requirements for optimum performance cyclically scheduled loops with tree-dependence graphs to the degree of function unit pipelining, the instruction issue bandwidth, and code properties. A method for finding upper and lower bounds on the minimum register requirements is also presented.The result of this work is a theory for assessing register requirements that can be used to reveal fundamental differences among machines within a space of architectural and implementation design choices. Some experimental data is also provided to support the theory.