Optimal pipelining in supercomputers

  • Authors:
  • S. R. Kunkel;J. E. Smith

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, Wisconsin;Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, Wisconsin

  • Venue:
  • ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
  • Year:
  • 1986

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper examines the relationship between the degree of central processor pipelining and performance. This relationship is studied in the context of modern supercomputers. Limitations due to instruction dependencies are studied via simulations of the CRAY-1S. Both scalar and vector code are studied. This study shows that instruction dependencies severely limit performance for scalar code as well as overall performance.The effects of latch overhead are then considered. The primary cause of latch overhead is the difference between maximum and minimum gate propagation delays. This causes both the skewing of data as it passes along the data path, and unintentional clock skewing due to clock fanout logic. Latch overhead is studied analytically in order to lower bound the clock period that may be used in a pipelined system. This analysis also touches on other points related to latch clocking. This analysis shows that for short pipeline segments both the Earle latch and polarity hold latch give the same clock period bound for both single-phase and multi-phase clocks. Overhead due to data skew and unintentional clock skew are each added to the CRAY-1S simulation model. Simulation results with realistic assumptions show that eight to ten gate levels per pipeline segment lead to optimal overall performance. The results also show that for short pipeline segments data skew and clock skew contribute about equally to the degradation in performance.