Chip multiprocessors with speculative multithreading: design for performance and energy efficiency

  • Authors:
  • Josep Torrellas;Jose Renau

  • Affiliations:
  • University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign

  • Venue:
  • Chip multiprocessors with speculative multithreading: design for performance and energy efficiency
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

While Chip Multiprocessors (CMP) with Speculative Multithreading (SM) support have been gaining momentum, experienced processor designers in industry have reservations about their practical implementation. SM CMPs must exploit multiple sources of speculative task-level parallelism, if they want to achieve enough performance improvement for non-numerical applications. Additionally, it is felt that SM is too energy-inefficient to compete against conventional superscalars. This thesis challenges for the first time the commonly-held view that SM consumes excessive energy. It shows a CMP with SM support that is not only faster but also more energy efficient than a state-of-the-art wide-issue superscalar. This is demonstrated with a new energy-efficient CMP micro-architecture. To achieve these results, this thesis is also the first one to propose micro-architectural mechanisms that, taken together, fundamentally enable fast SM with out-of-order spawn in a CMP. These simple mechanisms are: Splitting Timestamp Intervals, the Immediate Successor List, and Dynamic Task Merging. To evaluate them, we develop a SM compiler with and without out-of-order spawn. In addition, the thesis identifies the sources of energy consumption in SM, and proposes energy-centric optimizations that mitigate them. Experiments with the SpecInt 2000 codes show that a CMP with 4 3-issue cores and support for SM delivers a speedup of 1.27 over a 3-issue superscalar. The SM CMP is even faster than a 6-issue superscalar at the same frequency, and consumes only 85% of its power. In fact, for the same average power in both chips, the SM CMP is 1.13 times faster than the 6-issue superscalar on average.