Total power-optimal pipelining and parallel processing under process variations in nanometer technology

Authors:
Nam Sung Kim;Taeho Kgil;K. Bowman;V. De;T. Mudge
Affiliations:
Intel Corp., Hillsoro, Oregon;California Univ., La Jolla, CA, USA;California Univ., La Jolla, CA, USA;California Univ., La Jolla, CA, USA;California Univ., La Jolla, CA, USA
Venue:
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Year:
2005

Citing 10
Cited 11

The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Life is CMOS: why chase the life after?

Proceedings of the 39th annual Design Automation Conference
The optimum pipeline depth for a microprocessor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Optimizing pipelines for power and performance

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimum Power/Performance Pipeline Depth

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Gate leakage reduction for scaled devices using transistor stacking

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and reliability challenges in nanometer technologies

Proceedings of the 41st annual Design Automation Conference
Power-optimal pipelining in deep submicron technology

Proceedings of the 2004 international symposium on Low power electronics and design

Mitigating the Impact of Process Variations on Processor Register Files and Execution Units

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Microarchitecture parameter selection to optimize system performance under process variation

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Working with process variation aware caches

Proceedings of the conference on Design, automation and test in Europe
Variation-aware resource sharing and binding in behavioral synthesis

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Error-resilient motion estimation architecture

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analyzing impact of multiple ABB and AVS domains on throughput of power and thermal-constrained multi-core processors

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
Overhead-aware energy optimization for real-time streaming applications on multiprocessor System-on-Chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Uncertainty-aware dynamic power management in partially observable domains

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DEFCAM: A design and evaluation framework for defect-tolerant cache memories

ACM Transactions on Architecture and Code Optimization (TACO)
Misleading energy and performance claims in sub/near threshold digital systems

Proceedings of the International Conference on Computer-Aided Design
Full length article: Design of pre-processing algorithms for efficient MIMO-OFDM receiver architectures

Physical Communication

Quantified Score

Hi-index	0.06

Visualization

Abstract

This paper explores the effectiveness of the simultaneous application of pipelining and parallel processing as a total power (static plus dynamic) reduction technique in digital systems. Previous studies have been limited to either pipelining or parallel processing, but both techniques can be used together to reduce supply voltage at a fixed throughput point. According to our first-order analyses, there exist optimal combinations of pipelining depth and parallel processing width to minimize total power consumption. We show that the leakage power from both subthreshold and gate-oxide tunneling plays a significant role in determining the optimal combination of pipelining depth and parallel processing width. Our experiments are conducted with timing information derived from a 65nm technology and fanout-of-four (FO4) inverter chains. The experiments show that the optimal combinations of both pipelining and parallel processing - 8 /spl sim/ 12 /spl times/ FO4 logic depth pipelining with 2 /spl sim/ 3-wide parallel processing - can reduce the total power by as much as 40% compared to an optimal system using only pipelining or parallel processing alone. We extend our study to show how process parameter variations - an increasingly important factor in nanometer technologies - affects these results. Our analyses reveal that the variations shift the optimal points to shallower pipelining and narrower parallel processing - 12 /spl times/ FO4 logic depth with 2-wide parallel processing - at a fixed yield point.