An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
Optimal pipelining in supercomputers
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Instruction issue logic for high-performance, interruptable pipelined processors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Instruction issue logic for pipelined supercomputers
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Design of a Computer—The Control Data 6600
Design of a Computer—The Control Data 6600
Multiple instruction issue and single-chip processors
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
IEEE Transactions on Computers
IEEE Transactions on Computers
The effect on RISC performance of register set size and structure versus code generation strategy
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for achieving improved processor throughput
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A parallel pipelined processor with conditional instruction execution
ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
Effects of building blocks on the performance of super-scalar architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives
IEEE Transactions on Parallel and Distributed Systems
Exploiting short-lived variables in superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
Retrospective: instruction issue logic for high-performance, interruptable pipelined processors
25 years of the international symposia on Computer architecture (selected papers)
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Efficient Instruction Sequencing with Inline Target Insertion
IEEE Transactions on Computers
ELEON3LP - Superscalar and low-power enhancements of single issue general purpose processor model
Microprocessors & Microsystems
Hi-index | 0.01 |
In this paper, we look at the interaction of pipelining and multiple functional units in single processor machines. When implementing a high performance machine, a number of hardware techniques maybe used to improve the performance of the final system. Our goal is to gain an understanding of how each of these techniques contribute to performance improvement. As a basis for our studies we use a CRAY-like processor model and the issue rate (instructions per clock cycle) as the performance measure. We then systematically augment this base, non-pipelined, machine with more and more hardware features and evaluate the performance impact of each feature. We find, for example, that in non-vector machines, pipelining multiple function units does not provide significant performance improvements. Dataflow limits are then derived for our benchmark programs to determine the performance potential of each benchmark. In addition, other limits are computed which apply more realistic constraints on a computation. Based on these more realistic limits, we determine it is worthwhile to investigate the performance improvements that can be achieved from issuing multiple instructions each clock cycle. Several hardware approaches are evaluated for issuing multiple instructions each clock cycle.