Optimal pipelining in supercomputers
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing
The floating-point unit of the PowerPC 603e microprocessor
IBM Journal of Research and Development
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Energy-driven integrated hardware-software optimizations using SimplePower
Proceedings of the 27th annual international symposium on Computer architecture
On pipelining dynamic instruction scheduling logic
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Clocking strategies and scannable latches for low power appliacations
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The Architecture of Symbolic Computers
The Architecture of Symbolic Computers
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Select-free instruction scheduling logic
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 2002 international symposium on Low power electronics and design
Deep-Submicron Microprocessor Design Issues
IEEE Micro
Representative Traces for Processor Models with Infinite Cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy
ARVLSI '01 Proceedings of the 2001 Conference on Advanced Research in VLSI
Inherently lower-power high-performance superscalar architectures
Inherently lower-power high-performance superscalar architectures
Optimum Power/Performance Pipeline Depth
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Energy estimation of peripheral devices in embedded systems
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
Power-optimal pipelining in deep submicron technology
Proceedings of the 2004 international symposium on Low power electronics and design
Balancing hardware intensity in microprocessor pipelines
IBM Journal of Research and Development
IBM Journal of Research and Development
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
The optimum pipeline depth considering both power and performance
ACM Transactions on Architecture and Code Optimization (TACO)
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
IEEE Transactions on Computers
An integrated performance and power model for superscalar processor designs
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Dynamic instruction schedulers in a 3-dimensional integration technology
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Chip multiprocessing and the cell broadband engine
Proceedings of the 3rd conference on Computing frontiers
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
A pulsed low-voltage swing latch for reduced power dissipation in high-frequency microprocessors
Proceedings of the 2006 international symposium on Low power electronics and design
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving the accuracy of snoop filtering using stream registers
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies
IEEE Transactions on Computers
Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI
IBM Journal of Research and Development
The cell broadband engine: exploiting multiple levels of parallelism in a chip multiprocessor
International Journal of Parallel Programming
Optimal pipeline depth with pipeline stage unification adoption
ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
Reducing complexity of multiobjective design space exploration in VLIW-based embedded systems
ACM Transactions on Architecture and Code Optimization (TACO)
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A Dynamic Control Mechanism for Pipeline Stage Unification by Identifying Program Phases
IEICE - Transactions on Information and Systems
A mechanistic performance model for superscalar out-of-order processors
ACM Transactions on Computer Systems (TOCS)
Profile-based dynamic pipeline scaling
The Journal of Supercomputing
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Circuit techniques for dynamic variation tolerance
Proceedings of the 46th Annual Design Automation Conference
Program phase detection based dynamic control mechanisms for pipeline stage unification adoption
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Energy-performance tradeoffs in processor architecture and circuit design: a marginal cost analysis
Proceedings of the 37th annual international symposium on Computer architecture
Integrated execution: a programming model for accelerators
IBM Journal of Research and Development
Introduction to the wire-speed processor and architecture
IBM Journal of Research and Development
Supervised learning based power management for multicore processors
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III
Journal of Computer Science and Technology
Pipeline strategy for improving optimal energy efficiency in ultra-low voltage design
Proceedings of the 48th Design Automation Conference
Co-optimization of performance and power in a superscalar processor design
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Misleading energy and performance claims in sub/near threshold digital systems
Proceedings of the International Conference on Computer-Aided Design
PARROT: power awareness through selective dynamically optimized traces
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Finding extreme behaviors in microprocessor workloads
Transactions on High-Performance Embedded Architectures and Compilers IV
A fine-grained many VT design methodology for ultra low voltage operations
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPl-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 F04. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models.