On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings

Authors:
Arun Kejariwal;Xinmin Tian;Wei Li;Milind Girkar;Sergey Kozhukhov;Hideki Saito;Utpal Banerjee;Alexandru Nicolau;Alexander V. Veidenbaum;Constantine D. Polychronopoulos
Affiliations:
University of California at Irvine, Irvine, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;Intel Corporation, Novosibirsk, Russia;Intel Corporation, Santa Clara, CA;Intel Corporation, Santa Clara, CA;University of California at Irvine, Irvine, CA;University of California at Irvine, Irvine, CA;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
Proceedings of the 20th annual international conference on Supercomputing
Year:
2006

Citing 27
Cited 13

MASA: a multithreaded processor architecture for parallel symbolic computing

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Run-time disambiguation: coping with statically unpredictable dependencies

IEEE Transactions on Computers
Region Scheduling: An Approach for Detecting and Redistributing Parallelism

IEEE Transactions on Software Engineering
IBM RISC System/6000 processor architecture

IBM Journal of Research and Development
Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Speculative disambiguation: a compilation technique for dynamic memory disambiguation

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Limits of Data Value Predictability

International Journal of Parallel Programming
Speculative Versioning Cache

IEEE Transactions on Parallel and Distributed Systems
Silent Stores and Store Value Locality

IEEE Transactions on Computers
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Loop-Level Parallelism in Numeric and Symbolic Programs

IEEE Transactions on Parallel and Distributed Systems
Limits on Speculative Module-Level Parallelism in Imperative and Object-Oriented Programs on CMP Platforms

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Control Speculation in Multithreaded Processors through Dynamic Loop Detection

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Quantitative Assessment of Thread-Level Speculation Techniques

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
The optimization of horizontal microcode within and beyond basic blocks: an application of processor scheduling with resources

The optimization of horizontal microcode within and beyond basic blocks: an application of processor scheduling with resources
A General Compiler Framework for Speculative Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Helper Threads via Virtual Multithreading

IEEE Micro
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Speculative Synchronization: Programmability and Performance for Parallel Codes

IEEE Micro
Automatic detection of saturation and clipping idioms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Implicit parallelism with ordered transactions

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Tight analysis of the performance potential of thread speculation using spec CPU 2006

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Function level parallelism driven by data dependencies

ACM SIGARCH Computer Architecture News
Software behavior oriented parallelization

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Cache-aware iteration space partitioning

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
On the exploitation of loop-level parallelism in embedded applications

ACM Transactions on Embedded Computing Systems (TECS)
Techniques for efficient placement of synchronization primitives

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Synchronization optimizations for efficient execution on multi-cores

Proceedings of the 23rd international conference on Supercomputing
Exploitation of nested thread-level speculative parallelism on multi-core systems

Proceedings of the 7th ACM international conference on Computing frontiers
A profile-based tool for finding pipeline parallelism in sequential programs

Parallel Computing
Shared Register File Based ILP for Multicore

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Parallel programming of general-purpose programs using task-based programming models

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research in thread-level speculation (TLS) has proposed several mechanisms for optimistic execution of difficult-to-analyze serial codes in parallel. Though it has been shown that TLS helps to achieve higher levels of parallelism, evaluation of the unique performance potential of TLS, i.e., performance gain that be achieved only through speculation, has not received much attention. In this paper, we evaluate this aspect, by separating the speedup achievable via true TLP (thread-level parallelism) and TLS, for the SPEC CPU2000 benchmark. Further, we dissect the performance potential of each type of speculation --- control speculation, data dependence speculation and data value speculation. To the best of our knowledge, this is the first dissection study of its kind. Assuming an oracle TLS mechanism --- which corresponds to perfect speculation and zero threading overhead --- whereby the execution time of a candidate program region (for speculative execution) can be reduced to zero, our study shows that, at the loop-level, the upper bound on the arithmetic mean and geometric mean speedup achievable via TLS across SPEC CPU2000 is 39.16% (standard deviation = 31.23) and 18.18% respectively.