Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

  • Authors:
  • Dean M. Tullsen;Susan J. Eggers;Joel S. Emer;Henry M. Levy;Jack L. Lo;Rebecca L. Stamm

  • Affiliations:
  • Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA;Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA;Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA;Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA;Dept of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA;Digital Equipment Corporation, HLO2-3/J3, 77 Reed Road, Hudson, MA

  • Venue:
  • ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
  • Year:
  • 1996

Quantified Score

Hi-index 0.02

Visualization

Abstract

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.