Highly concurrent scalar processing
Highly concurrent scalar processing
HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Compile time optimization of memory and register usage on the Cray 2
Selected papers of the second workshop on Languages and compilers for parallel computing
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1
Computer - Special issue on experimental research in computer architecture
Performance bounds and buffer space requirements for concurrent processors
Performance bounds and buffer space requirements for concurrent processors
Vector register design for polycyclic vector scheduling
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Decoupled access/execute computer architectures
ACM Transactions on Computer Systems (TOCS)
A Fortran compiler for the FPS-164 scientific computer
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Improving the throughput of a pipeline by insertion of delays
ISCA '76 Proceedings of the 3rd annual symposium on Computer architecture
A systolic array optimizing compiler
A systolic array optimizing compiler
Minimum register requirements for a modulo schedule
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Optimum modulo schedules for minimum register requirements
ICS '95 Proceedings of the 9th international conference on Supercomputing
Register allocation for predicated code
Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
Heuristics for register-constrained software pipelining
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs
ICS '97 Proceedings of the 11th international conference on Supercomputing
Modulo Scheduling with Reduced Register Pressure
IEEE Transactions on Computers
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
Effective cluster assignment for modulo scheduling
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improved spill code generation for software pipelined loops
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
ACM SIGPLAN Notices
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching
Journal of VLSI Signal Processing Systems
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Evaluating the Use of Register Queues in Software Pipelined Loops
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Non-Consistent Dual Register Files to Reduce Register Pressure
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Partitioning and scheduling DSP applications with maximal memory access hiding
EURASIP Journal on Applied Signal Processing
On Periodic Register Need in Software Pipelining
IEEE Transactions on Computers
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Hi-index | 0.01 |
To enable concurrent instruction execution, scientific computers generally rely on pipelining, which combines with faster system clocks to achieve greater throughput. Each concurrently executing instruction requires buffer space, usually implemented as a register, to receive its result. This paper focuses on the issue of how many registers are required to achieve optimal performance in pipelined scientific computers. Four machine models are considered: single, double, and triple issue scalar machines, and vector machines with various register lengths. A model is presented that accurately relates the register requirements for optimum performance cyclically scheduled loops with tree-dependence graphs to the degree of function unit pipelining, the instruction issue bandwidth, and code properties. A method for finding upper and lower bounds on the minimum register requirements is also presented.The result of this work is a theory for assessing register requirements that can be used to reveal fundamental differences among machines within a space of architectural and implementation design choices. Some experimental data is also provided to support the theory.