Instruction Window Size Trade-Offs and Characterization of Program Parallelism

Authors:
P. K. Dubey;G. B. Adams, III;M. J. Flynn
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1994

Citing 16
Cited 9

An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors

IEEE Transactions on Computers
Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction

MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Instruction issue logic for high-performance, interruptable pipelined processors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Architecture and compiler tradeoffs for a long instruction wordprocessor

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
IBM RISC System/6000 processor architecture

IBM Journal of Research and Development
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Architecture and implementation of a VLIW supercomputer

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)

IEEE Transactions on Computers
Dynamic trace analysis for analytic modeling of superscalar performance

Performance Evaluation - Special issue: performance modeling of parallel processing systems
Using a lookahead window in a compaction-based parallelizing compiler

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Instruction issue logic for pipelined supercomputers

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

Theoretical modeling of superscalar processor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
An analytical model of high performance superscalar-based multiprocessors

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Sensitivity analysis of a superscalar processor model

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Control Flow Modeling in Statistical Simulation for Accurate and Efficient Processor Design Studies

Proceedings of the 31st annual international symposium on Computer architecture
Measuring Benchmark Similarity Using Inherent Program Characteristics

IEEE Transactions on Computers
Predicting communication protocol performance on superscalar architectures using instruction dependency

Performance Evaluation
Distilling the essence of proprietary workloads into miniature benchmarks

ACM Transactions on Architecture and Code Optimization (TACO)
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
AVF Stressmark: Towards an Automated Methodology for Bounding the Worst-Case Vulnerability to Soft Errors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	14.98

Visualization

Abstract

Detecting independent operations is a prime objective for computers that are capable of issuing and executing multiple operations simultaneously. The number of instructions that are simultaneously examined for detecting those that are independent is the scope of concurrency detection. The authors present an analytical model for predicting the performance impact of varying the scope of concurrency detection as a function of available resources, such as number of pipelines in a superscalar architecture. The model developed can show where a performance bottleneck might be: insufficient resources to exploit discovered parallelism, insufficient instruction stream parallelism, or insufficient scope of concurrency detection. The cost associated with speculative execution is examined via a set of probability distributions that characterize the inherent parallelism in the instruction stream. These results were derived using traces from a Multiflow TRACE SCHEDULING compacting FORTRAN 77 and C compilers. The experiments provide misprediction delay estimates for 11 common application-level benchmarks under scope constraints, assuming speculative, out-of-order execution and run time scheduling. The throughput prediction of the analytical model is shown to be close to the measured static throughput of the compiler output.