Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Semantic parallelization: a practical exercise in abstract interpretation
POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Run-time disambiguation: coping with statically unpredictable dependencies
IEEE Transactions on Computers
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Optimizing direct threaded code by selective inlining
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A unified semantic approach for the vectorization and parallelization of generalized reductions
ICS '89 Proceedings of the 3rd international conference on Supercomputing
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Dependence Analysis
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Redundant Synchronization Elimination for DOACROSS Loops
Proceedings of the 8th International Symposium on Parallel Processing
A compiler framework for speculative analysis and optimizations
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Recycling waste: exploiting wrong-path execution to improve branch prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Automatic synchronisation elimination in synchronous FORALLs
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Quantitative Assessment of Thread-Level Speculation Techniques
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
The Software Optimization Cookbook
The Software Optimization Cookbook
Thread-Spawning Schemes for Speculative Multithreading
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion
Proceedings of the international symposium on Code generation and optimization
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Proceedings of the 19th annual international conference on Supercomputing
Efficient Techniques for Advanced Data Dependence Analysis
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 20th annual international conference on Supercomputing
ACM SIGMETRICS Performance Evaluation Review
Compiler-Driven Dependence Profiling to Guide Program Parallelization
Languages and Compilers for Parallel Computing
On the efficacy of call graph-level thread-level speculation
Proceedings of the first joint WOSP/SIPEW international conference on Performance engineering
Exploitation of nested thread-level speculative parallelism on multi-core systems
Proceedings of the 7th ACM international conference on Computing frontiers
Runtime automatic speculative parallelization
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Multi-cores such as the Intel®1 Core™2 Duo processor, facilitate efficient thread-level parallel execution of ordinary programs, wherein the different threads-of-execution are mapped onto different physical processors. In this context, several techniques have been proposed for auto-parallelization of programs. Recently, thread-level speculation (TLS) has been proposed as a means to parallelize difficult-to-analyze serial codes. In general, more than one technique can be employed for parallelizing a given program. The overlapping nature of the applicability of the various techniques makes it hard to assess the intrinsic performance potential of each. In this paper, we present a tight analysis of the (unique) performance potential of both: (a) TLS in general and (b) specific types of thread-level speculation, viz., control speculation, data dependence speculation and data value speculation, for the SPEC2 CPU2006 benchmark suite in light of the various limiting factors such as the threading overhead and misspeculation penalty. To the best of our knowledge, this is the first evaluation of TLS based on SPEC CPU2006 and accounts for the aforementioned real-life con-straints. Our analysis shows that, at the innermost loop level, the upper bound on the speedup uniquely achievable via TLS with the state-of-the-art thread implementations for both SPEC CINT2006 and CFP2006 is of the order of 1%.