Pulse width allocation and clock skew scheduling: optimizing sequential circuits based on pulsed latches

  • Authors:
  • Hyein Lee;Seungwhun Paik;Youngsoo Shin

  • Affiliations:
  • Samsung Electronics, Yongin, Gyeonggi-Do, Korea;Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea;Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea

  • Venue:
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.03

Visualization

Abstract

Pulsed latches, latches driven by a brief clock pulse, offer the same convenience of timing verification and optimization as flip-flop-based circuits, while retaining the advantages of latches over flip-flops. But a pulsed latch that uses a single pulse width has a lower bound on its clock period, limiting its capacity to deal with higher frequencies or operate at lower Vdd. The limitation still exists even when clock skew scheduling is employed, since the amount of skew that can be assigned and realized is practically limited due to process variation. For the first time, we formulate the problem of allocating pulse widths, out of a small discrete number of predefined widths, and scheduling clock skews, within a predefined upper bound on skew, for optimizing pulsed latch-based sequential circuits. We then present an algorithm called PWCS_Optimize (pulse width allocation and clock skew scheduling, PWCS) to solve the problem. The allocated skews are realized through synthesis of local clock trees between pulse generators and latches, and a global clock tree between a clock source and pulse generators. Experiments with 65-nm technology demonstrate that combining a small number of different pulse widths with clock skews of up to 10% of the clock period yield the minimum achievable clock period for many benchmark circuits. The results have an average figure of merit of 0.86, where 1.0 indicates a minimum clock period, and the average reduction in area by 11%. The design flow including PWCS_Optimize, placement and routing, and synthesis of local and global clock trees is presented and assessed with example circuits.