Improved clock-gating through transparent pipelining
Proceedings of the 2004 international symposium on Low power electronics and design
IBM Journal of Research and Development
Stretching the Limits of Clock-Gating Efficiency in Server-Class Processors
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Stall cycle redistribution in a transparent fetch pipeline
Proceedings of the 2006 international symposium on Low power electronics and design
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Hi-index | 0.00 |
Instruction-driven clock scheduling is a mechanism that minimizes clock power in deeply-pipelined datapaths. Analysis of realistic processor workloads shows a preponderance of bubbles persist through pipelines like the floating point unit. Clock scheduling ostensibly adapts pipeline depth with respect to bubbles in the instruction stream without performance loss. Unfortunately, shallower pipelines (i.e. longer pipe stages) are prone to larger amounts of glitches propagating through logic, increasing dynamic power. Experimentally measured results from a 130nm FPU test chip with flexible clocking capabilities show a super-linear increase in glitch-induced dynamic power for shallower pipelines. While higher glitch power can severely diminish the power savings offered by clock scheduling, judicious clocking of intermediate stages offers glitch mitigation to recover power savings for worst-case scenarios. Detailed analysis of clock scheduling applied to a FPU in a POWER4-like processor running realistic workloads shows an average net power savings of 15% compared to an aggressively clock-gated design.