Optimizing pipelines for power and performance

Authors:
Viji Srinivasan;David Brooks;Michael Gschwind;Pradip Bose;Victor Zyuban;Philip N. Strenski;Philip G. Emma
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY;IBM T.J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Year:
2002

Citing 21
Cited 48

Optimal pipelining in supercomputers

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance

IEEE Transactions on Computers
Optimal pipelining

Journal of Parallel and Distributed Computing
The floating-point unit of the PowerPC 603e microprocessor

IBM Journal of Research and Development
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Clocking strategies and scannable latches for low power appliacations

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The Architecture of Symbolic Computers

The Architecture of Symbolic Computers
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Unified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels

Proceedings of the 2002 international symposium on Low power electronics and design
Environment for PowerPC Microarchitecture Exploration

IEEE Micro
Deep-Submicron Microprocessor Design Issues

IEEE Micro
Power-Aware Microarchitecture: Design and Modeling Challenges for Next-Generation Microprocessors

IEEE Micro
Representative Traces for Processor Models with Infinite Cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy

ARVLSI '01 Proceedings of the 2001 Conference on Advanced Research in VLSI
Inherently lower-power high-performance superscalar architectures

Inherently lower-power high-performance superscalar architectures

Optimum Power/Performance Pipeline Depth

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Energy estimation of peripheral devices in embedded systems

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Power Awareness through Selective Dynamically Optimized Traces

Proceedings of the 31st annual international symposium on Computer architecture
Power-optimal pipelining in deep submicron technology

Proceedings of the 2004 international symposium on Low power electronics and design
Balancing hardware intensity in microprocessor pipelines

IBM Journal of Research and Development
New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors

IBM Journal of Research and Development
How accurate should early design stage power/performance tools be? A case study with statistical simulation

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
The optimum pipeline depth considering both power and performance

ACM Transactions on Architecture and Code Optimization (TACO)
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Computer Architecture: Challenges and Opportunities for the Next Decade

IEEE Micro
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors

IEEE Transactions on Computers
An integrated performance and power model for superscalar processor designs

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Dynamic instruction schedulers in a 3-dimensional integration technology

GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Chip multiprocessing and the cell broadband engine

Proceedings of the 3rd conference on Computing frontiers
Total power-optimal pipelining and parallel processing under process variations in nanometer technology

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
A pulsed low-voltage swing latch for reduced power dissipation in high-frequency microprocessors

Proceedings of the 2006 international symposium on Low power electronics and design
Exploiting Workload Parallelism for Performance and Power Optimization in Blue Gene

IEEE Micro
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving the accuracy of snoop filtering using stream registers

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies

IEEE Transactions on Computers
Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI

IBM Journal of Research and Development
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor

International Journal of Parallel Programming
Optimal pipeline depth with pipeline stage unification adoption

ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
Reducing complexity of multiobjective design space exploration in VLIW-based embedded systems

ACM Transactions on Architecture and Code Optimization (TACO)
Toward a multicore architecture for real-time ray-tracing

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A Dynamic Control Mechanism for Pipeline Stage Unification by Identifying Program Phases

IEICE - Transactions on Information and Systems
A mechanistic performance model for superscalar out-of-order processors

ACM Transactions on Computer Systems (TOCS)
Profile-based dynamic pipeline scaling

The Journal of Supercomputing
Optimizing total power of many-core processors considering voltage scaling limit and process variations

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Circuit techniques for dynamic variation tolerance

Proceedings of the 46th Annual Design Automation Conference
Program phase detection based dynamic control mechanisms for pipeline stage unification adoption

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis

Proceedings of the 37th annual international symposium on Computer architecture
Integrated execution: a programming model for accelerators

IBM Journal of Research and Development
Introduction to the wire-speed processor and architecture

IBM Journal of Research and Development
Supervised learning based power management for multicore processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exploring the architecture of a stream register-based snoop filter

Transactions on high-performance embedded architectures and compilers III
A fine-grained runtime power/performance optimization method for processors with adaptive pipeline depth

Journal of Computer Science and Technology
Pipeline strategy for improving optimal energy efficiency in ultra-low voltage design

Proceedings of the 48th Design Automation Conference
Co-optimization of performance and power in a superscalar processor design

EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Misleading energy and performance claims in sub/near threshold digital systems

Proceedings of the International Conference on Computer-Aided Design
PARROT: power awareness through selective dynamically optimized traces

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Finding extreme behaviors in microprocessor workloads

Transactions on High-Performance Embedded Architectures and Compilers IV
A fine-grained many VT design methodology for ultra low voltage operations

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.01

Visualization

Abstract

During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPl-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 F04. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.