Leakage-tolerant design techniques for high performance processors

  • Authors:
  • Vivek De

  • Affiliations:
  • Microprocessor Research, Intel Labs, Hillsboro, OR

  • Venue:
  • Proceedings of the 2002 international symposium on Physical design
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In sub-100nm technology generation, transistor subthresholdleakage is 100-1000nA/μm for high performance microprocessorlogic technology. As gate oxide thickness approaches sub-10Åregime, gate oxide leakage escalates to 10-100A/cm2.Junction leakages also become significant as doping levels aroundthe junction approach 5X1018 cm-3. Theseexcessive leakage currents contribute to large leakage powerdissipation during (1) active operation, (2) standby or idle modeand (3) burn-in. In addition, excessive subthreshold leakagedegrades noise margin or robustness of performance-criticalcircuits such as wide-OR domino gates, register files and cache.Large gate oxide leakage also limits circuit fanout. Therefore,high performance and low power processor designs must employleakage power control techniques to alleviate active powerdissipation and delivery challenges, extend battery life andprevent thermal runaway during burn-in. In addition,leakage-tolerant high performance circuits must be used to provideadequate circuit robustness.The most effective design technique for reducing leakage poweris dual-VT design. Performance-critical transistors aremade low-VT to provide the required performance andhigh-VT transistors are used everywhere else to minimizeleakage power without impacting processor frequency. Optimaldual-VT designs can provide the same frequency as adesign with single low-VT, while limitinglow-VT usage to 30%. As a result, leakage power duringactive, standby and burn-in is reduced by 3X without anyperformance impact. Of course, process complexity is slightlyhigher since extra critical masking steps are needed to provideadditional transistor VT's. EDA tools for optimalVT allocation during all phases of the design flow arecritical for successful dual-VT designs. VTallocations can be performed at logic gate level or transistorlevel. While transistor level allocation is most effective forleakage power reduction, it is also the most complex. 75-100mVdifference between the high-VT and low-VTvalues is found to be most optimal for typical microprocessordesign.Another technique for leakage power control is body bias.Traditionally, reverse body bias (RBB) is applied to increaseVT and thus, reduce leakage power during idle mode.However, VT modulation achievable by RBB reduces atshorter channel lengths and lower VT values because ofworsening short-channel effects and weaker body effect. Therefore,RBB becomes less effective with technology scaling as bothVT and channel length are scaled down. The amount ofuseful reverse bias is limited to 500mV by increasing junctionleakage and drain-body junction breakdown during burn-in.Alternately, forward body bias (FBB) can be used during active modeto lower VT and provide the desired performance at lowvoltages. FBB is withdrawn during idle mode to reduce standbyleakage power. Since FBB improves short-channel effects, itprovides better VT modulation capability with technologyscaling. A 1.1V, 1GHz communication router chip in a 150nm logictechnology with FBB demonstrates 3.5X standby leakage powerreduction, when compared to lowering VT by processtechnology. Full chip area and power overheads of on-chip body biasgenerators and bias grids are only 2% and 1%, respectively.Leakage power of a chip is the sum total of individualtransistor leakages. Within-die critical dimension (CD) variationscause lengths of many transistors to be below the target value.Since transistor leakages increase exponentially at smallerlengths, leakages of these devices are the dominant contributors tofull-chip leakage. Leakage power estimation tools must, therefore,account for within-die CD variations accurately. Both die-to-dieand within-die variations in device parameters dictate thefrequency and leakage power distributions of microprocessors involume manufacturing. Only those dies that meet both minimumfrequency and maximum power constraints are acceptable.Bidirectional adaptive body bias (ABB) is effective forcompensating for these variations. FBB is applied to speed up diesthat are too slow. RBB is used to bring dies that are too leakywithin the power envelope. The die acceptance rate and number ofdies in the highest frequency bin can be improved significantly byABB, as demonstrated by measurements on a testchip in 150nmtechnology.Leakage current through a stack of two or more "off" transistorsis an order of magnitude smaller than a single device leakage. Thisso-called "stack effect" becomes stronger with technology scalingas DIBL worsens. Many circuit blocks in a microprocessor alreadycontain a significant number of transistor stacks in complex logicgates. Thus, leakage power depends strongly on the primary inputvector to the block. These "natural stacks" can be exploited forstandby leakage power reduction by activating the "minimum leakage"input vector during idle mode. 2X reduction in standby leakagepower is achievable for a 32-bit adder with 3-80μs minimum timerequired in "standby" so that the switching energy consumed forentry into and exit from idle mode is less than 10% of the leakagepower saved. In addition, transistors that are notperformance-critical can be converted into stacks to reduce leakagewithout impacting overall processor performance. Thus, "stackforcing" allows one to emulate behavior of high-VTdevices not available from the process technology. Using asingle-VT process in conjunction with "stack forcing"can reduce leakage power of a 32-bit instruction decoder block by3X without any performance degradation.Finally, noise margin degradation of wide-OR domino gates andregister files due to excessive leakage requires keeper transistorsto be upsized and static stages to be deskewed. The resultingperformance loss is severe and unacceptable. Conditional keeperscan be used to provide the desired robustness with minimalperformance loss. A pseudo-static local bitline scheme for registerfiles also reduces bitline leakage significantly, allowing targetperformance to be achieved in the presence of excessiveleakage.