On the potential of latency tolerant execution in speculative multithreading

  • Authors:
  • Haitham Akkary;Komal Jothi;Renjith Retnamma;Satyanarayana Nekkalapu;Doug Hall;Shahrokh Shahidzadeh

  • Affiliations:
  • American University of Beirut;Portland State University;Portland State University;Portland State University;Portland State University;Intel Corporation

  • Venue:
  • IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

High performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and too power hungry for the many-core processors era. We propose a new architecture that uses multiple latency-tolerant in-order cores to improve single-thread performance, without requiring complex out-of-order execution hardware or large, power hungry register files and instruction buffers. Using simple cores to provide improved single-thread performance for conventional difficult-to-parallelize applications allows designers to place many more of these cores on the same die. Consequently, emerging highly parallel applications can take full advantage of the many-core parallel hardware without sacrificing performance of inherently serial applications. Our architecture splits single-thread program execution into disjoint control and data threads that execute concurrently on multiple latency-tolerant in-order cores. Hence we call this style of execution Disjoint Out-of-Order Execution (DOE). DOE is a novel implementation of Speculative Multithreading (SpMT). It uses latency tolerance to overcome performance issues of SpMT caused by load imbalance and inter-thread data communication delays. Using control independence prediction hardware to spawn threads, we simulate the potential performance of DOE on a subset of Spec2000 integer benchmarks under various parallelism scenarios and for DOE configurations of 2, 4, 6 and 8 single-issue latency tolerant cores.