The optimum pipeline depth for a microprocessor

Authors:
A. Hartstein;Thomas R. Puzak
Affiliations:
IBM - T. J. Watson Research Center, Yorktown Heights, NY;IBM - T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Year:
2002

Citing 6
Cited 47

Optimal pipelining in supercomputers

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance

IEEE Transactions on Computers
Simulation and analysis of a pipeline processor

WSC '89 Proceedings of the 21st conference on Winter simulation
Contrasting characteristics and cache performance of technical and multi-user commercial workloads

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Profetching and memory system behavior of the SPEC95 benchmark suite

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture

Dynamic addressing memory arrays with physical locality

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimizing pipelines for power and performance

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Exploiting data-width locality to increase superscalar execution bandwidth

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Microarchitectural denial of service: insuring microarchitectural fairness

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Recycling waste: exploiting wrong-path execution to improve branch prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic memory instruction bypassing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Dynamic Data Dependence Tracking and its Application to Branch Prediction

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Optimum Power/Performance Pipeline Depth

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Billion-Transistor Architectures: There and Back Again

Computer
Use-Based Register Caching with Decoupled Indexing

Proceedings of the 31st annual international symposium on Computer architecture
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Power-optimal pipelining in deep submicron technology

Proceedings of the 2004 international symposium on Low power electronics and design
New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors

IBM Journal of Research and Development
Alloyed branch history: combining global and local branch history for robust performance

International Journal of Parallel Programming
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamically Trading Frequency for Complexity in a GALS Microprocessor

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Effects of speculation on performance and issue queue design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The optimum pipeline depth considering both power and performance

ACM Transactions on Architecture and Code Optimization (TACO)
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors

IEEE Transactions on Computers
Dynamic memory instruction bypassing

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Total power-optimal pipelining and parallel processing under process variations in nanometer technology

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
A performance counter architecture for computing accurate CPI components

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
An analysis of the effects of miss clustering on the cost of a cache miss

Proceedings of the 4th international conference on Computing frontiers
ReCycle:: pipeline adaptation to tolerate process variation

Proceedings of the 34th annual international symposium on Computer architecture
VPC prediction: reducing the cost of indirect branches via hardware-based dynamic devirtualization

Proceedings of the 34th annual international symposium on Computer architecture
Pipeline spectroscopy

Proceedings of the 2007 workshop on Experimental computer science
Pipeline spectroscopy

ecs'07 Experimental computer science on Experimental computer science
A superscalar simulation employing poisson distributed stalls

Computers and Electrical Engineering
Toward a multicore architecture for real-time ray-tracing

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Profile-based dynamic pipeline scaling

The Journal of Supercomputing
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

ACM SIGARCH Computer Architecture News
Optimizing total power of many-core processors considering voltage scaling limit and process variations

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Finding representative workloads for computer system design

Finding representative workloads for computer system design
Compiler support for dynamic pipeline scaling

EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Challenges and methodologies for efficient power budgeting across the die

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Fine grain pipeline systems for real-time motion and stereo-vision computation

International Journal of High Performance Systems Architecture
Applied inference: Case studies in microarchitectural design

ACM Transactions on Architecture and Code Optimization (TACO)
Automatic microarchitectural pipelining

Proceedings of the Conference on Design, Automation and Test in Europe
System-level process variability analysis and mitigation for 3D MPSoCs

Proceedings of the Conference on Design, Automation and Test in Europe
Comparing FPGA vs. custom cmos and the impact on processor microarchitecture

Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
A fine-grained runtime power/performance optimization method for processors with adaptive pipeline depth

Journal of Computer Science and Technology
Pipeline strategy for improving optimal energy efficiency in ultra-low voltage design

Proceedings of the 48th Design Automation Conference
CPU DB: recording microprocessor history

Communications of the ACM
Micro-architecture performance estimation by formula

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
CPU DB: Recording Microprocessor History

Queue - Processors

Quantified Score

Hi-index	0.02

Visualization

Abstract

The impact of pipeline length on the performance of a microprocessor is explored both theoretically and by simulation. An analytical theory is presented that shows two opposing architectural parameters affect the optimal pipeline length: the degree of instruction level parallelism (superscalar) decreases the optimal pipeline length, while the lack of pipeline stalls increases the optimal pipeline length. This theory is tested by analyzing the optimal pipeline length for 35 applications representing three classes of workloads. Trace tapes are collected from SPEC95 and SPEC2000 applications, traditional (legacy) database and on-line transaction processing (OLTP) applications, and modern (e. g. web) applications primarily written in Java and C++. The results show that there is a clear and significant difference in the optimal pipeline length between the SPEC workloads and both the legacy and modern applications. The SPEC applications, written in C, optimize to a shorter pipeline length than the legacy applications, largely written in assembler language, with relatively little overlap in the two distributions. Additionally, the optimal pipeline length distribution for the C++ and Java workloads overlaps with the legacy applications, suggesting similar workload characteristics. These results are explored across a wide range of superscalar processors, both in-order and out-of-order.