Tight analysis of the performance potential of thread speculation using spec CPU 2006

  • Authors:
  • Arun Kejariwal;Xinmin Tian;Milind Girkar;Wei Li;Sergey Kozhukhov;Utpal Banerjee;Alexander Nicolau;Alexander V. Veidenbaum;Constantine D. Polychronopoulos

  • Affiliations:
  • University of California, Irvine, Irvine, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;University of California, Irvine, Irvine, CA;University of California, Irvine, Irvine, CA;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL

  • Venue:
  • Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-cores such as the Intel®1 Core™2 Duo processor, facilitate efficient thread-level parallel execution of ordinary programs, wherein the different threads-of-execution are mapped onto different physical processors. In this context, several techniques have been proposed for auto-parallelization of programs. Recently, thread-level speculation (TLS) has been proposed as a means to parallelize difficult-to-analyze serial codes. In general, more than one technique can be employed for parallelizing a given program. The overlapping nature of the applicability of the various techniques makes it hard to assess the intrinsic performance potential of each. In this paper, we present a tight analysis of the (unique) performance potential of both: (a) TLS in general and (b) specific types of thread-level speculation, viz., control speculation, data dependence speculation and data value speculation, for the SPEC2 CPU2006 benchmark suite in light of the various limiting factors such as the threading overhead and misspeculation penalty. To the best of our knowledge, this is the first evaluation of TLS based on SPEC CPU2006 and accounts for the aforementioned real-life con-straints. Our analysis shows that, at the innermost loop level, the upper bound on the speedup uniquely achievable via TLS with the state-of-the-art thread implementations for both SPEC CINT2006 and CFP2006 is of the order of 1%.