A Scheme to Enforce Data Dependence on Large Multiprocessor Systems
IEEE Transactions on Software Engineering
Compiler algorithms for synchronization
IEEE Transactions on Computers
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Run-time parallelization: its time has come
Parallel Computing - Special issues on languages and compilers for parallel computers
IEEE Transactions on Parallel and Distributed Systems
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture
IEEE Transactions on Computers
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Techniques for speculative run-time parallelization of loops
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
In Search of Speculative Thread-Level Parallelism
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops
IPDPS '02 Proceedings of the 16th International Symposium on Parallel and Distributed Processing
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Design Space Exploration of a Software Speculative Parallelization Scheme
IEEE Transactions on Parallel and Distributed Systems
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
ReSlice: Selective Re-Execution of Long-Retired Misspeculated Instructions Using Forward Slicing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments
International Journal of Parallel Programming
Hi-index | 0.00 |
Existing runtime parallelization techniques impose severe performance penalties when a speculative parallelization is attempted and fails. Some techniques require a sequential restart of the speculative execution while others only disregard the work after the first point of failure. This paper introduces a new technique that reduces the performance overhead of failure to less than 1% on standard processors through a combination of hoisting the failure path and partitioning work to a Coinspector Thread.