IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Life is CMOS: why chase the life after?
Proceedings of the 39th annual Design Automation Conference
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Optimizing pipelines for power and performance
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimum Power/Performance Pipeline Depth
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Gate leakage reduction for scaled devices using transistor stacking
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and reliability challenges in nanometer technologies
Proceedings of the 41st annual Design Automation Conference
Power-optimal pipelining in deep submicron technology
Proceedings of the 2004 international symposium on Low power electronics and design
Mitigating the Impact of Process Variations on Processor Register Files and Execution Units
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Microarchitecture parameter selection to optimize system performance under process variation
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Working with process variation aware caches
Proceedings of the conference on Design, automation and test in Europe
Variation-aware resource sharing and binding in behavioral synthesis
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Error-resilient motion estimation architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Uncertainty-aware dynamic power management in partially observable domains
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DEFCAM: A design and evaluation framework for defect-tolerant cache memories
ACM Transactions on Architecture and Code Optimization (TACO)
Misleading energy and performance claims in sub/near threshold digital systems
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.06 |
This paper explores the effectiveness of the simultaneous application of pipelining and parallel processing as a total power (static plus dynamic) reduction technique in digital systems. Previous studies have been limited to either pipelining or parallel processing, but both techniques can be used together to reduce supply voltage at a fixed throughput point. According to our first-order analyses, there exist optimal combinations of pipelining depth and parallel processing width to minimize total power consumption. We show that the leakage power from both subthreshold and gate-oxide tunneling plays a significant role in determining the optimal combination of pipelining depth and parallel processing width. Our experiments are conducted with timing information derived from a 65nm technology and fanout-of-four (FO4) inverter chains. The experiments show that the optimal combinations of both pipelining and parallel processing - 8 /spl sim/ 12 /spl times/ FO4 logic depth pipelining with 2 /spl sim/ 3-wide parallel processing - can reduce the total power by as much as 40% compared to an optimal system using only pipelining or parallel processing alone. We extend our study to show how process parameter variations - an increasingly important factor in nanometer technologies - affects these results. Our analyses reveal that the variations shift the optimal points to shallower pipelining and narrower parallel processing - 12 /spl times/ FO4 logic depth with 2-wide parallel processing - at a fixed yield point.