The Nonuniform Distribution of Instruction-Level and Machine Parallelism and its Effect on Performance

Authors:
N. P. Jouppi
Affiliations:
-
Venue:
IEEE Transactions on Computers
Year:
1989

Citing 8
Cited 27

Global register allocation at link time

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The Mahler experience: using an intermediate language as the machine description

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
The performance potential of multiple functional unit processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Instruction issue logic for pipelined supercomputers

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Reduced Instruction Set Computer Architectures for VLSI

Reduced Instruction Set Computer Architectures for VLSI

Instruction level profiling and evaluation of the IBM/6000

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The effect of employing advanced branching mechanisms in superscalar processors

ACM SIGARCH Computer Architecture News
Exploiting multi-way branching to boost superscalar processor performance

ACM SIGPLAN Notices
DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

ACM SIGARCH Computer Architecture News
Computer Technology and Architecture: An Evolving Interaction

Computer
On the instruction-level characteristics of scalar code in highly-vectorized scientific applications

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Performance analysis and design methodology for a scalable superscalar architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
SCISM: a scalable compound instruction set machine

IBM Journal of Research and Development
Theoretical modeling of superscalar processor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Performance evaluation of the PowerPC 620 microarchitecture

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing superscalar performance through multistreaming

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
HLS: combining statistical and symbolic simulation to guide microprocessor designs

Proceedings of the 27th annual international symposium on Computer architecture
Overcoming the challenges to feedback-directed optimization (Keynote Talk)

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Application of instruction analysis/scheduling techniques to resource allocation of superscalar processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and Implementation Trade-Offs in the Clipper C400 Architecture

IEEE Micro
Toward Advanced Parallel Processing: Exploiting Parallelism at Task and Instruction Levels

IEEE Micro
Interlock Collapsing ALU's

IEEE Transactions on Computers
Exploiting Instruction-Level Parallelism for Integrated Control-Flow Monitoring

IEEE Transactions on Computers
High-Performance 3-1 Interlock Collapsing ALU's

IEEE Transactions on Computers
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Predicting communication protocol performance on superscalar architectures using instruction dependency

Performance Evaluation
ILP in the undergraduate curriculum

WCAE '02 Proceedings of the 2002 workshop on Computer architecture education: Held in conjunction with the 29th International Symposium on Computer Architecture
Accurate critical path prediction via random trace construction

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Proof of correctness of high-performance 3-1 interlock collapsing ALUs

IBM Journal of Research and Development
Performance analysis of multi-threaded multi-core CPUs

Proceedings of the First International Workshop on Many-core Embedded Systems

Quantified Score

Hi-index	14.99

Visualization

Abstract

A methodology for quickly estimating machine performance is developed. A first-order estimate is based on the average degree of machine parallelism. A second-order model corrects for the effects of nonuniformities in instruction-level and machine parallelism and is shown to be accurate to within 15% for three widely different machine pipelines: the CRAY-1, the MultiTitan, and a dual-issue superscalar machine.