Efficient instruction scheduling for a pipelined architecture
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
A study of scalar compilation techniques for pipelined supercomputers
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Scheduling expressions on a pipelined processor with a maximal delay of one cycle
ACM Transactions on Programming Languages and Systems (TOPLAS)
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A Framework for Computer Performance Evaluation Using Benchmark Sets
IEEE Transactions on Computers
Hi-index | 0.00 |
We present evaluation results on configuration of superscalar processors. We have developed tools for evaluating superscalar processors, which consists of an optimizing compiler and a configurable simulator. These tools can accept architectural parameters which characterize the configuration of target superscalar processors. We can estimate the most optimal performance of each configuration, as the compiler extracts the parallelism in application programs maximally and the execution on the simulator reflects the actual behavior of the target processor. By comparing exectution results on various configuration of processors, we can observe the effect of each architectural parameter and get optimal configuration of superscalar processor.We have evaluated the performance varying the following specification of superscalar processors; (1) The number of instructions the processor can issue maximally on each cycle (2) The number of functional units and memory ports (3) Latency of instructions. The results show that four instruction issue per cycle attains enough performance and the integer unit should have two ALUs and two memory ports for most integer programs tested here. For floating point programs, if programs are highly optimized, reducing instruction latency is more effective than having more functional units, in order to enhance the performance of superscalar processors.