Characterization of branch and data dependencies on programs for evaluating pipeline performance

Authors:
P. G. Emma;E. S. Davidson
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1987

Citing 22
Cited 15

Disk cache—miss ratio analysis and design considerations

ACM Transactions on Computer Systems (TOCS)
Clocking Schemes for High-Speed Digital Systems

IEEE Transactions on Computers
Optimal pipelining in supercomputers

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Sequentiality and prefetching in database systems

ACM Transactions on Database Systems (TODS)
Pipeline Architecture

ACM Computing Surveys (CSUR)
Cache Memories

ACM Computing Surveys (CSUR)
Long term file migration: development and evaluation of algorithms

Communications of the ACM
Computer system design using a hierarchical approach to performance evaluation

Communications of the ACM
Performance evaluation of highly concurrent computers by deterministic simulation

Communications of the ACM
The Architecture of Symbolic Computers

The Architecture of Symbolic Computers
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Measurement and analysis of instruction use in the VAX-11/780

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A modeling approach and design tool for pipelined central processors

ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
An instruction timing model of CPU performance

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Exploring a Stack Architecture

Computer
Dynamic Characteristics of Loops

IEEE Transactions on Computers
Dynamic Profile of Instruction Sequences for the IBM System/370

IEEE Transactions on Computers
Two-Level Replacement Decisions in Paging Stores

IEEE Transactions on Computers
Branch Prediction Strategies and Branch Target Buffer Design

Computer
The Amdahl 470V/8 and the IBM 3033: A Comparison of Processor Designs

Computer
A study of replacement algorithms for a virtual-storage computer

IBM Systems Journal

Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Understanding some simple processor-performance limits

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Alternative implementations of two-level adaptive branch prediction

25 years of the international symposia on Computer architecture (selected papers)
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal
Optimizing pipelines for power and performance

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimum Power/Performance Pipeline Depth

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Speeding Up Processing with Approximation Circuits

Computer
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
The optimum pipeline depth considering both power and performance

ACM Transactions on Architecture and Code Optimization (TACO)
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Micro-architecture performance estimation by formula

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Predicting Performance Impact of DVFS for Realistic Memory Systems

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	14.98

Visualization

Abstract

The nature by which branches and data dependencies generate delays that degrade pipeline performance is investigated in this paper. We show that for the general execution trace, few specific delays can be considered in isolation; rather, the magnitude of any specific delay may depend on the relative proximity of other delays. This phenomenon can make the task of accurately characterizing a trace tape with simple statistics intractable. We present a set of trace reductions that facilitates this task by simplifying the corresponding data-dependency graph. The reductions operate on multiple data-dependency arcs and branches in conjunction; those arcs whose performance implications are redundant with respect to the dependency graph are identified, and eliminated from the graph. We show that the reduced graph can be accurately characterized by simple statistics. We use these statistics to show that as the length of a pipeline increases, the performance degradation due to data dependencies and branches increases monotonically. However, lengthening the pipeline may correspond to decreasing the cycle time of the pipeline. These two opposing effects are used in conjunction to derive an equation for optimal pipeline length for a given trace tape. The optimal pipeline length is shown to be characterized by n = √γα where γ is the ratio of overall circuit delay to latching overhead, and a is a function of the trace statistics that accounts for the delays induced by data dependencies and branches.