Performance evaluation for various configuration of superscalar processors

Authors:
Atsushi Inoue;Kenji Takeda
Affiliations:
-;-
Venue:
ACM SIGARCH Computer Architecture News
Year:
1993

Citing 5
Cited 1

Efficient instruction scheduling for a pipelined architecture

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
A study of scalar compilation techniques for pipelined supercomputers

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Scheduling expressions on a pipelined processor with a maximal delay of one cycle

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic loop interchange

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction

A Framework for Computer Performance Evaluation Using Benchmark Sets

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present evaluation results on configuration of superscalar processors. We have developed tools for evaluating superscalar processors, which consists of an optimizing compiler and a configurable simulator. These tools can accept architectural parameters which characterize the configuration of target superscalar processors. We can estimate the most optimal performance of each configuration, as the compiler extracts the parallelism in application programs maximally and the execution on the simulator reflects the actual behavior of the target processor. By comparing exectution results on various configuration of processors, we can observe the effect of each architectural parameter and get optimal configuration of superscalar processor.We have evaluated the performance varying the following specification of superscalar processors; (1) The number of instructions the processor can issue maximally on each cycle (2) The number of functional units and memory ports (3) Latency of instructions. The results show that four instruction issue per cycle attains enough performance and the integer unit should have two ALUs and two memory ports for most integer programs tested here. For floating point programs, if programs are highly optimized, reducing instruction latency is more effective than having more functional units, in order to enhance the performance of superscalar processors.