On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

  • Authors:
  • Arun Kejariwal;Xinmin Tian;Wei Li;Milind Girkar;Sergey Kozhukhov;Hideki Saito;Utpal Banerjee;Alexandru Nicolau;Alexander V. Veidenbaum;Constantine D. Polychronopoulos

  • Affiliations:
  • University of California at Irvine, Irvine, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Novosibirsk, Russia;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;University of California at Irvine, Irvine, CA;University of California at Irvine, Irvine, CA;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • Proceedings of the 20th annual international conference on Supercomputing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research in thread-level speculation (TLS) has proposed several mechanisms for optimistic execution of difficult-to-analyze serial codes in parallel. Though it has been shown that TLS helps to achieve higher levels of parallelism, evaluation of the unique performance potential of TLS, i.e., performance gain that be achieved only through speculation, has not received much attention. In this paper, we evaluate this aspect, by separating the speedup achievable via true TLP (thread-level parallelism) and TLS, for the SPEC CPU2000 benchmark. Further, we dissect the performance potential of each type of speculation --- control speculation, data dependence speculation and data value speculation. To the best of our knowledge, this is the first dissection study of its kind. Assuming an oracle TLS mechanism --- which corresponds to perfect speculation and zero threading overhead --- whereby the execution time of a candidate program region (for speculative execution) can be reduced to zero, our study shows that, at the loop-level, the upper bound on the arithmetic mean and geometric mean speedup achievable via TLS across SPEC CPU2000 is 39.16% (standard deviation = 31.23) and 18.18% respectively.