Optimal pipelining in supercomputers
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Proceedings of the 2002 international symposium on Low power electronics and design
Deep-Submicron Microprocessor Design Issues
IEEE Micro
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-optimal pipelining in deep submicron technology
Proceedings of the 2004 international symposium on Low power electronics and design
The optimum pipeline depth considering both power and performance
ACM Transactions on Architecture and Code Optimization (TACO)
Closing the power gap between ASIC and custom: an ASIC perspective
Proceedings of the 42nd annual Design Automation Conference
Proceedings of the 32nd annual international symposium on Computer Architecture
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
IEEE Transactions on Computers
Cache miss behavior: is it √2?
Proceedings of the 3rd conference on Computing frontiers
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
An analysis of the effects of miss clustering on the cost of a cache miss
Proceedings of the 4th international conference on Computing frontiers
Proceedings of the 2007 workshop on Experimental computer science
ecs'07 Experimental computer science on Experimental computer science
Design automation of real-life asynchronous devices and systems
Foundations and Trends in Electronic Design Automation
Optimal Power/Performance Pipeline Depth for SMT in Scaled Technologies
IEEE Transactions on Computers
Optimal pipeline depth with pipeline stage unification adoption
ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
A Dynamic Control Mechanism for Pipeline Stage Unification by Identifying Program Phases
IEICE - Transactions on Information and Systems
A mechanistic performance model for superscalar out-of-order processors
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
WiDGET: Wisconsin decoupled grid execution tiles
Proceedings of the 37th annual international symposium on Computer architecture
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
A Predictive Model for Dynamic Microarchitectural Adaptivity Control
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
MOS current mode circuits: analysis design and variability
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Journal of Computer Science and Technology
CPU DB: recording microprocessor history
Communications of the ACM
CPU DB: Recording Microprocessor History
Queue - Processors
Exploiting Timing Error Resilience in Processor Architecture
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
Dynamic microarchitectural adaptation using machine learning
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.02 |
The impact of pipeline length on both the power andperformance of a microprocessor is explored boththeoretically and by simulation. A theory is presented fora wide range of power/performance metrics, BIPSm/W.The theory shows that the more important power is to themetric, the shorter the optimum pipeline length thatresults. For typical parameters neither BIPS/W norBIPS2/W yield an optimum, i.e., a non-pipelined design isoptimal. For BIPS3/W the optimum, averaged over all 55workloads studied, occurs at a 22.5 FO4 design point, a 7stage pipeline, but this value is highly dependent on theassumed growth in latch count with pipeline depth. Asdynamic power grows, the optimal design point shifts toshorter pipelines. Clock gating pushes the optimum todeeper pipelines. Surprisingly, as leakage power grows,the optimum is also found to shift to deeper pipelines. Theoptimum pipeline depth varies for different classes ofworkloads: SPEC95 and SPEC2000 integer applications,traditional (legacy) database and on-line transactionprocessing applications, modern (e. g. web) applications,and floating point applications.