MASA: a multithreaded processor architecture for parallel symbolic computing
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Run-time disambiguation: coping with statically unpredictable dependencies
IEEE Transactions on Computers
Region Scheduling: An Approach for Detecting and Redistributing Parallelism
IEEE Transactions on Software Engineering
IBM RISC System/6000 processor architecture
IBM Journal of Research and Development
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Speculative disambiguation: a compilation technique for dynamic memory disambiguation
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Limits of Data Value Predictability
International Journal of Parallel Programming
IEEE Transactions on Parallel and Distributed Systems
Silent Stores and Store Value Locality
IEEE Transactions on Computers
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Control Speculation in Multithreaded Processors through Dynamic Loop Detection
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
A Quantitative Assessment of Thread-Level Speculation Techniques
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
The optimization of horizontal microcode within and beyond basic blocks: an application of processor scheduling with resources
A General Compiler Framework for Speculative Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Helper Threads via Virtual Multithreading
IEEE Micro
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic detection of saturation and clipping idioms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Implicit parallelism with ordered transactions
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Tight analysis of the performance potential of thread speculation using spec CPU 2006
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Function level parallelism driven by data dependencies
ACM SIGARCH Computer Architecture News
Software behavior oriented parallelization
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Cache-aware iteration space partitioning
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
On the exploitation of loop-level parallelism in embedded applications
ACM Transactions on Embedded Computing Systems (TECS)
Techniques for efficient placement of synchronization primitives
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelization spectroscopy: analysis of thread-level parallelism in hpc programs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Synchronization optimizations for efficient execution on multi-cores
Proceedings of the 23rd international conference on Supercomputing
Exploitation of nested thread-level speculative parallelism on multi-core systems
Proceedings of the 7th ACM international conference on Computing frontiers
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
Shared Register File Based ILP for Multicore
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Parallel programming of general-purpose programs using task-based programming models
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Hi-index | 0.00 |
Recent research in thread-level speculation (TLS) has proposed several mechanisms for optimistic execution of difficult-to-analyze serial codes in parallel. Though it has been shown that TLS helps to achieve higher levels of parallelism, evaluation of the unique performance potential of TLS, i.e., performance gain that be achieved only through speculation, has not received much attention. In this paper, we evaluate this aspect, by separating the speedup achievable via true TLP (thread-level parallelism) and TLS, for the SPEC CPU2000 benchmark. Further, we dissect the performance potential of each type of speculation --- control speculation, data dependence speculation and data value speculation. To the best of our knowledge, this is the first dissection study of its kind. Assuming an oracle TLS mechanism --- which corresponds to perfect speculation and zero threading overhead --- whereby the execution time of a candidate program region (for speculative execution) can be reduced to zero, our study shows that, at the loop-level, the upper bound on the arithmetic mean and geometric mean speedup achievable via TLS across SPEC CPU2000 is 39.16% (standard deviation = 31.23) and 18.18% respectively.