Characterization of simultaneous multithreading (SMT) efficiency in POWER5

  • Authors:
  • H. M. Mathis;A. E. Mericas;J. D. McCalpin;R. J. Eickemeyer;S. R. Kunkel

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • IBM Journal of Research and Development - POWER5 and packaging
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Coarse-grained multithreading, the switching of threads to avoid idle processor time during long-latency events, has been available on IBM systems since 1998. Simultaneous multithreading (SMT), first available on the POWER5TM processor, moves beyond simple thread switching to the maintenance of two thread streams that are issued as continuously as possible to ensure the maximum use of processor resources. Because SMT has the potential of increasing processor officiency and correspondingly increasing the amount of work done for a given time span, the reader might suppose that SMT would exhibit a performance gain for all workloads. This is true for most workloads, but is not true in some exceptional cases. In SMT mode, the processor resources--register sets, caches, queues, translation buffers, and the system memory nest--must be shared by both threads, and conditions can occur that degrade or even obviate SMT performance improvement. The POWER4TM and POWER5 processors have very powerful performance monitor (PM) toolsets that can help the user to determine what is occurring in workloads that may not be providing expected SMT gains. In this paper, the results of measured differences among workloads having large, medium, small, and even negative SMT performance gains are presented along with an approach to investigating workloads to determine the source of SMT performance gain limits.