Communication optimization and code generation for distributed memory machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Hybrid analysis: static & dynamic memory reference analysis
International Journal of Parallel Programming
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting distributed version concurrency in a transactional memory cluster
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Software transactional memory for large scale clusters
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Software thread-level speculation: an optimistic library implementation
Proceedings of the 1st international workshop on Multicore software engineering
DiSTM: A Software Transactional Memory Framework for Clusters
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Software Transactional Memory: Why Is It Only a Research Toy?
Queue - The Concurrency Problem
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
D2STM: Dependable Distributed Software Transactional Memory
PRDC '09 Proceedings of the 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing
Speculative parallelization using software multi-threaded transactions
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Speculative separation for privatization and reductions
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Optimizing software runtime systems for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Automatic speculative parallelization of loops using polyhedral dependence analysis
Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores
Proceedings of the 6th International Systems and Storage Conference
Online dynamic dependence analysis for speculative polyhedral parallelization
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
ASC: automatically scalable computation
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Leveraging GPUs using cooperative loop speculation
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Automatic parallelization for clusters is a promising alternative to time-consuming, error-prone manual parallelization. However, automatic parallelization is frequently limited by the imprecision of static analysis. Moreover, due to the inherent fragility of static analysis, small changes to the source code can significantly undermine performance. By replacing static analysis with speculation and profiling, automatic parallelization becomes more robust and applicable. A naïve automatic speculative parallelization does not scale for distributed memory clusters, due to the high bandwidth required to validate speculation. This work is the first automatic speculative DOALL (Spec-DOALL) parallelization system for clusters. We have implemented a prototype automatic parallelization system, called Cluster Spec-DOALL, which consists of a Spec-DOALL parallelizing compiler and a speculative runtime for clusters. Since the compiler optimizes communication patterns, and the runtime is optimized for the cases in which speculation succeeds, Cluster Spec-DOALL minimizes the communication and validation overheads of the speculative runtime. Across 8 benchmarks, Cluster Spec-DOALL achieves a geomean speedup of 43.8x on a 120-core cluster, whereas DOALL without speculation achieves only 4.5x speedup. This demonstrates that speculation makes scalable fully-automatic parallelization for clusters possible.