Optimum Power/Performance Pipeline Depth

Authors:
A. Hartstein;Thomas R. Puzak
Affiliations:
IBM - T. J. Watson Research Center, Yorktown Heights, NY;IBM - T. J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2003

Citing 11
Cited 26

Optimal pipelining in supercomputers

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance

IEEE Transactions on Computers
Optimal pipelining

Journal of Parallel and Distributed Computing
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Unified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels

Proceedings of the 2002 international symposium on Low power electronics and design
Deep-Submicron Microprocessor Design Issues

IEEE Micro
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
Optimizing pipelines for power and performance

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture

Power-optimal pipelining in deep submicron technology

Proceedings of the 2004 international symposium on Low power electronics and design
The optimum pipeline depth considering both power and performance

ACM Transactions on Architecture and Code Optimization (TACO)
Closing the power gap between ASIC and custom: an ASIC perspective

Proceedings of the 42nd annual Design Automation Conference
Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors

IEEE Transactions on Computers
Cache miss behavior: is it √2?

Proceedings of the 3rd conference on Computing frontiers
Total power-optimal pipelining and parallel processing under process variations in nanometer technology

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
An analysis of the effects of miss clustering on the cost of a cache miss

Proceedings of the 4th international conference on Computing frontiers
Pipeline spectroscopy

Proceedings of the 2007 workshop on Experimental computer science
Pipeline spectroscopy

ecs'07 Experimental computer science on Experimental computer science
Design automation of real-life asynchronous devices and systems

Foundations and Trends in Electronic Design Automation
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies

IEEE Transactions on Computers
Optimal pipeline depth with pipeline stage unification adoption

ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
A Dynamic Control Mechanism for Pipeline Stage Unification by Identifying Program Phases

IEICE - Transactions on Information and Systems
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Optimizing total power of many-core processors considering voltage scaling limit and process variations

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
WiDGET: Wisconsin decoupled grid execution tiles

Proceedings of the 37th annual international symposium on Computer architecture
Analyzing impact of multiple ABB and AVS domains on throughput of power and thermal-constrained multi-core processors

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
A Predictive Model for Dynamic Microarchitectural Adaptivity Control

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MOS current mode circuits: analysis design and variability

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A fine-grained runtime power/performance optimization method for processors with adaptive pipeline depth

Journal of Computer Science and Technology
CPU DB: recording microprocessor history

Communications of the ACM
CPU DB: Recording Microprocessor History

Queue - Processors
Exploiting Timing Error Resilience in Processor Architecture

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Dynamic microarchitectural adaptation using machine learning

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.02

Visualization

Abstract

The impact of pipeline length on both the power andperformance of a microprocessor is explored boththeoretically and by simulation. A theory is presented fora wide range of power/performance metrics, BIPSm/W.The theory shows that the more important power is to themetric, the shorter the optimum pipeline length thatresults. For typical parameters neither BIPS/W norBIPS2/W yield an optimum, i.e., a non-pipelined design isoptimal. For BIPS3/W the optimum, averaged over all 55workloads studied, occurs at a 22.5 FO4 design point, a 7stage pipeline, but this value is highly dependent on theassumed growth in latch count with pipeline depth. Asdynamic power grows, the optimal design point shifts toshorter pipelines. Clock gating pushes the optimum todeeper pipelines. Surprisingly, as leakage power grows,the optimum is also found to shift to deeper pipelines. Theoptimum pipeline depth varies for different classes ofworkloads: SPEC95 and SPEC2000 integer applications,traditional (legacy) database and on-line transactionprocessing applications, modern (e. g. web) applications,and floating point applications.