The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

  • Authors:
  • M. S. Hrishikesh;Doug Burger;Norman P. Jouppi;Stephen W. Keckler;Keith I. Farkas;Premkishore Shivakumar

  • Affiliations:
  • The University of Texas, Austin;The University of Texas, Austin;Compaq Computer Corporation;The University of Texas, Austin;Compaq Computer Corporation;The University of Texas, Austin

  • Venue:
  • ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
  • Year:
  • 2002

Quantified Score

Hi-index 0.05

Visualization

Abstract

Microprocessor clock frequency has improved by nearly 40% annually over the past decade. This improvement has been provided, in equal measure, by smaller technologies and deeper pipelines. From our study of the SPEC 2000 benchmarks, we find that for a high-performance architecture implemented in 100nm technology, the optimal clock period is approximately 8 fan-out-of-four (FO4) inverter delays for integer benchmarks, comprised of 6 FO4 of useful work and an overhead of about 2 FO4. The optimal clock period for floating-point benchmarks is 6 FO4. We find these optimal points to be insensitive to latch and clock skew overheads. Our study indicates that further pipelining can at best improve performance of integer programs by a factor of 2 over current designs. At these high clock frequencies it will be difficult to design the instruction issue window to operate in a single cycle. Consequently, we propose and evaluate a high-frequency design called a segmented instruction window.