Interprocedural dependence analysis and parallelization
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Disjoint eager execution: an optimal form of speculative execution
Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References
IEEE Transactions on Computers
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Multipath execution: opportunities and limits
ICS '98 Proceedings of the 12th international conference on Supercomputing
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Maximizing parallelism and minimizing synchronization with affine partitions
Parallel Computing - Special issues on languages and compilers for parallel computers
Data speculation support for a chip multiprocessor
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
IEEE Transactions on Parallel and Distributed Systems
Clustered speculative multithreaded processors
ICS '99 Proceedings of the 13th international conference on Supercomputing
The Superthreaded Processor Architecture
IEEE Transactions on Computers
The parallel execution of DO loops
Communications of the ACM
Removing architectural bottlenecks to the scalability of speculative parallelization
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Techniques for speculative run-time parallelization of loops
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Toward efficient and robust software speculative parallelization on multiprocessors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Optimistic parallelism requires abstractions
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Software behavior oriented parallelization
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Optimistic parallelism benefits from data partitioning
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Fast Track: A Software System for Speculative Program Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
SpiceC: scalable parallelism via implicit copying and explicit commit
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
General data structure expansion for multi-threading
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Practical speculative parallelization of variable-length decompression algorithms
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Hi-index | 0.00 |
With the availability of chip multiprocessor (CMP) and simultaneous multithreading (SMT) machines, extracting thread level parallelism from a sequential program has become crucial for improving performance. However, many sequential programs cannot be easily parallelized due to the presence of dependences. To solve this problem, different solutions have been proposed. Some of them make the optimistic assumption that such dependences rarely manifest themselves at runtime. However, when this assumption is violated, the recovery causes very large overhead. Other approaches incur large synchronization or computation overhead when resolving the dependences. Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution. In this paper we propose a compiler technique which uses state separation and multiple value prediction to speculatively parallelize loops in sequential programs that contain frequently arising cross-iteration dependences. The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables). These speculative versions and the preceding loop iteration are executed in separate memory states simultaneously. After the execution, if one of these versions is correct (i.e., its predicted values are found to be correct), then we merge its state and the state of the preceding iteration because the dependence between the two iterations is correctly resolved. The memory states of other incorrect versions are completely discarded. Based on this idea, we further propose a runtime adaptive scheme that not only gives a good performance but also achieves better CPU utilization. We conducted experiments on 10 benchmark programs on a real machine. The results show that our technique can achieve 1.7x speedup on average across all used benchmarks.