Limits on multiple instruction issue

Authors:
M. D. Smith;M. Johnson;M. A. Horowitz
Affiliations:
Center For Integrated Systems, Stanford University, Stanford, CA;Center For Integrated Systems, Stanford University, Stanford, CA;Center For Integrated Systems, Stanford University, Stanford, CA
Venue:
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Year:
1989

Citing 6
Cited 54

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The ZS-1 central processor

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Look-Ahead Processors

ACM Computing Surveys (CSUR)

Cost-effective design of application specific VLIW processors using the SCARCE framework

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
The Evolution of Instruction Sequencing

Computer - Special issue on instruction sequencing
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Performance from architecture: comparing a RISC and a CISC with similar hardware organization

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Exploiting fine-grained parallelism through a combination of hardware and software techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The effect of employing advanced branching mechanisms in superscalar processors

ACM SIGARCH Computer Architecture News
Exploiting multi-way branching to boost superscalar processor performance

ACM SIGPLAN Notices
DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

ACM SIGARCH Computer Architecture News
How many operation units are adequate?

ACM SIGARCH Computer Architecture News
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
The effect of real data cache behavior on the performance of a microarchitecture that supports dynamic scheduling

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)

IEEE Transactions on Computers
Computer Technology and Architecture: An Evolving Interaction

Computer
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Improving instruction supply efficiency in superscalar architectures using instruction trace buffers

SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
On the instruction-level characteristics of scalar code in highly-vectorized scientific applications

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Limitation of superscalar microprocessor performance

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Enhanced superscalar hardware: the schedule table

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
VLIW compilation techniques in a superscalar environment

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Designing the TFP Microprocessor

IEEE Micro
Characterizing the impact of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives

IEEE Transactions on Parallel and Distributed Systems
Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A comparison of full and partial predicated execution support for ILP processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An analysis of dynamic scheduling techniques for symbolic applications

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Register renaming and dynamic speculation: an alternative approach

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Speculative execution exception recovery using write-back suppression

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Available paralellism in video applications

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
The potential of data value speculation to boost ILP

ICS '98 Proceedings of the 12th international conference on Supercomputing
Speculative execution model with duplication

ICS '98 Proceedings of the 12th international conference on Supercomputing
Modeled and Measured Instruction Fetching Performance for Superscalar Microprocessors

IEEE Transactions on Parallel and Distributed Systems
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Better global scheduling using path profiles

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Branch Prediction, Instruction-Window Size, and Cache Size: Performance Trade-Offs and Simulation Techniques

IEEE Transactions on Computers
The impact of synchronization and granularity on parallel systems

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal
Efficient Instruction Sequencing with Inline Target Insertion

IEEE Transactions on Computers
Instruction Window Size Trade-Offs and Characterization of Program Parallelism

IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Efficient Exploitation of Instruction-Level Parallelism for Superscalar Processors by the Conjugate Register File Scheme

IEEE Transactions on Computers
Limits and Graph Structure of Available Instruction-Level Parallelism (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Program balance and its impact on high performance RISC architectures

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Characterizing a new class of threads in scientific applications for high end supercomputers

Proceedings of the 18th annual international conference on Supercomputing
A comparison of the effect of branch prediction on multithreaded and scalar architectures

ACM SIGARCH Computer Architecture News
Designing programming languages for the analyzability of pointer data structures

Computer Languages

Quantified Score

Hi-index	0.02

Visualization

Abstract

This paper investigates the limitations on designing a processor which can sustain an execution rate of greater than one instruction per cycle on highly-optimized, non-scientific applications. We have used trace-driven simulations to determine that these applications contain enough instruction independence to sustain an instruction rate of about two instructions per cycle. In a straightforward implementation, cost considerations argue strongly against decoding more than two instructions in one cycle. Given this constraint, the efficiency in instruction fetching rather than the complexity of the execution hardware limits the concurrency attainable at the instruction level.