Tight analysis of the performance potential of thread speculation using spec CPU 2006

Authors:
Arun Kejariwal;Xinmin Tian;Milind Girkar;Wei Li;Sergey Kozhukhov;Utpal Banerjee;Alexander Nicolau;Alexander V. Veidenbaum;Constantine D. Polychronopoulos
Affiliations:
University of California, Irvine, Irvine, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;University of California, Irvine, Irvine, CA;University of California, Irvine, Irvine, CA;University of Illinois at Urbana-Champaign, Urbana-Champaign, IL
Venue:
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2007

Citing 22
Cited 5

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Semantic parallelization: a practical exercise in abstract interpretation

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Run-time disambiguation: coping with statically unpredictable dependencies

IEEE Transactions on Computers
Branch prediction for free

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Optimizing direct threaded code by selective inlining

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A unified semantic approach for the vectorization and parallelization of generalized reductions

ICS '89 Proceedings of the 3rd international conference on Supercomputing
ILP versus TLP on SMT

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Dependence Analysis

Dependence Analysis
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Redundant Synchronization Elimination for DOACROSS Loops

Proceedings of the 8th International Symposium on Parallel Processing
A compiler framework for speculative analysis and optimizations

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Recycling waste: exploiting wrong-path execution to improve branch prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Automatic synchronisation elimination in synchronous FORALLs

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Quantitative Assessment of Thread-Level Speculation Techniques

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
The Software Optimization Cookbook

The Software Optimization Cookbook
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion

Proceedings of the international symposium on Code generation and optimization
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation

Proceedings of the 19th annual international conference on Supercomputing
Efficient Techniques for Advanced Data Dependence Analysis

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

Proceedings of the 20th annual international conference on Supercomputing

Evaluating the performance of single and multiple core processors with PCMARK®05 and benchmark analysis

ACM SIGMETRICS Performance Evaluation Review
Compiler-Driven Dependence Profiling to Guide Program Parallelization

Languages and Compilers for Parallel Computing
On the efficacy of call graph-level thread-level speculation

Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Exploitation of nested thread-level speculative parallelism on multi-core systems

Proceedings of the 7th ACM international conference on Computing frontiers
Runtime automatic speculative parallelization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-cores such as the Intel®1 Core™2 Duo processor, facilitate efficient thread-level parallel execution of ordinary programs, wherein the different threads-of-execution are mapped onto different physical processors. In this context, several techniques have been proposed for auto-parallelization of programs. Recently, thread-level speculation (TLS) has been proposed as a means to parallelize difficult-to-analyze serial codes. In general, more than one technique can be employed for parallelizing a given program. The overlapping nature of the applicability of the various techniques makes it hard to assess the intrinsic performance potential of each. In this paper, we present a tight analysis of the (unique) performance potential of both: (a) TLS in general and (b) specific types of thread-level speculation, viz., control speculation, data dependence speculation and data value speculation, for the SPEC2 CPU2006 benchmark suite in light of the various limiting factors such as the threading overhead and misspeculation penalty. To the best of our knowledge, this is the first evaluation of TLS based on SPEC CPU2006 and accounts for the aforementioned real-life con-straints. Our analysis shows that, at the innermost loop level, the upper bound on the speedup uniquely achievable via TLS with the state-of-the-art thread implementations for both SPEC CINT2006 and CFP2006 is of the order of 1%.