Simultaneous Multithreading: A Platform for Next-Generation Processors

  • Authors:
  • Susan J. Eggers;Joel S. Emer;Henry M. Levy;Jack L. Lo;Rebecca L. Stamm;Dean M. Tullsen

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • IEEE Micro
  • Year:
  • 1997

Quantified Score

Hi-index 0.01

Visualization

Abstract

As the processor community prepares for a billion transistors on a chip, researchers continue to debate the most effective way to use them. One approach is to add more memory (either cache or primary memory) to the chip, but the performance gain from memory alone is limited. Another approach is to increase the level of systems integration, bringing support functions like graphics accelerators and I/O controllers on chip. Although integration lowers system costs and communication latency, the overall performance gain to applications is again marginal. We believe the only way to significantly improve performance is to enhance the processor's computational capabilities. In general, this means increasing parallelism-in all its available forms. At present only certain forms of parallelism are being exploited. Current superscalars, for example, can execute four or more instructions per cycle; in practice, however, they achieve only one or two, because current applications have low instruction-level parallelism. Placing multiple superscalar processors on a chip is also not an effective solution, because, in addition to the low instruction-level parallelism, performance suffers when there is little thread-level parallelism. A better solution is to design a processor that can exploit all types of parallelism well. Simultaneous multithreading is a processor design that meets this goal, because it consumes both thread-level and instruction-level parallelism. In SMT processors, thread-level parallelism can come from either multithreaded, parallel programs or individual, independent programs in a multiprogramming workload. Instruction-level parallelism comes from each single program or thread. Because it successfully (and simultaneously) exploits both types of parallelism, SMT processors use resources more efficiently, and both instruction throughput and speedups are greater.