Performance comparison of ILP machines with cycle time evaluation

Authors:
Tetsuya Hara;Hideki Ando;Chikako Nakanishi;Masao Nakaya
Affiliations:
System LSI Laboratory, Mitsubishi Electric Corporation, 4-1 Mizuhara, Itami, Hyogo, 664 Japan;System LSI Laboratory, Mitsubishi Electric Corporation, 4-1 Mizuhara, Itami, Hyogo, 664 Japan;System LSI Laboratory, Mitsubishi Electric Corporation, 4-1 Mizuhara, Itami, Hyogo, 664 Japan;System LSI Laboratory, Mitsubishi Electric Corporation, 4-1 Mizuhara, Itami, Hyogo, 664 Japan
Venue:
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Year:
1996

Citing 10
Cited 5

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
MIPS RISC architecture

MIPS RISC architecture
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Unconstrained speculative execution with predicated state buffering

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementation of precise interrupts in pipelined processors

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture

Application of instruction analysis/scheduling techniques to resource allocation of superscalar processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Execution Latency Reduction via Variable Latency Pipeline and Instruction Reuse

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Exploring Microprocessor Architectures for Gigascale Integration

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Modeling technology impact on cluster microprocessor performance

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
A general framework to build new CPUs by mapping abstract machine code to instruction level parallel execution hardware

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many studies have investigated performance improvement through exploiting instruction-level parallelism (ILP) with a particular architecture. Unfortunately, these studies indicate performance improvement using the number of cycles that are required to execute a program, but do not quantitatively estimate the penalty imposed on the cycle time from the architecture. Since the performance of a microprocessor must be measured by its execution time, a cycle time evaluation is required as well as a cycle count speedup evaluation. Currently, superscalar machines are widely accepted as the machines which achieve the highest performance. On the other hand, because of hardware simplicity and instruction scheduling sophistication, there is a perception that the next generation of microprocessors will be implemented with a VLIW architecture. A simple VLIW machine, however, has a serious weakness regarding speculative execution. Thus, it is a question whether a simple VLIW machine really outperforms a superscalar machine. We recently proposed a mechanism called predicating that supports speculative execution for the VLIW machine, and showed a significant cycle count speedup over a scalar machine. Although the mechanism is simple, it is unknown how much it imposes a penalty on the cycle time, and how much the performance is improved as a result. This paper evaluates both the cycle count speedup and the cycle time for three ILP machines: a superscalar machine, a simple VLIW machine, and the VLIW machine with predicating. The evaluation results show that the simple VLIW machine slightly outperforms the superscalar machine, while the VLIW machine with predicating achieves a significant speedup of 1.41x over the superscalar machine.