Thread Partitioning and Value Prediction for Exploiting Speculative Thread-Level Parallelism
IEEE Transactions on Computers
Reducing misspeculation overhead for module-level speculative execution
Proceedings of the 2nd conference on Computing frontiers
Partially ordered epochs for thread-level speculation
Proceedings of the 2nd conference on Computing frontiers
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation
Proceedings of the 19th annual international conference on Supercomputing
SableSpMT: a software framework for analysing speculative multithreading in Java
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 33rd annual international symposium on Computer Architecture
Proceedings of the 20th annual international conference on Supercomputing
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
Estimating and exploiting potential parallelism by source-level dependence profiling
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Hi-index | 0.00 |
Abstract: This paper considers program modules, e.g. procedures, functions, and methods as the basic method to exploit speculative parallelism in existing codes. We analyze how much inherent and exploitable parallelism exist in a set of C and Java programs on a set of chip-multiprocessor architecture models, and identify what inherent program features, as well as architectural deficiencies, that limit the speedup. Our data complement previous limit studies by indicating that the programming style - object-oriented versus imperative - does not seem to have any noticeable impact on the achievable speedup. Further, we show that as few as eight processors are enough to exploit all of the inherent parallelism. However, memory-level data dependence resolution and thread management mechanisms of recent CMP proposals may impose overheads that severely limit the speedup obtained.