Scalable Speculative Parallelization on Commodity Clusters

Authors:
Hanjun Kim;Arun Raman;Feng Liu;Jae W. Lee;David I. August
Affiliations:
-;-;-;-;-
Venue:
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2010

Citing 30
Cited 9

TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Hardware for Speculative Parallelization of Partially-Parallel Loops in DSM Multiprocessors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Compiler optimization of value communication for thread-level speculation

Compiler optimization of value communication for thread-level speculation
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Exploiting distributed version concurrency in a transactional memory cluster

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Support for High-Frequency Streaming in CMPs

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
UPC: Distributed Shared Memory Programming (Wiley Series on Parallel and Distributed Computing)

UPC: Distributed Shared Memory Programming (Wiley Series on Parallel and Distributed Computing)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem

Parallel Computing
Speculative Decoupled Software Pipelining

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Revisiting the Sequential Programming Model for Multi-Core

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Software transactional memory for large scale clusters

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Software thread-level speculation: an optimistic library implementation

Proceedings of the 1st international workshop on Multicore software engineering
DiSTM: A Software Transactional Memory Framework for Clusters

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design)

Computer Organization and Design, Fourth Edition, Fourth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design)
Copy or Discard execution model for speculative parallelization on multicores

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Intelligent speculation for pipelined multithreading

Intelligent speculation for pipelined multithreading
The velocity compiler: extracting efficient multicore execution from legacy sequential codes

The velocity compiler: extracting efficient multicore execution from legacy sequential codes
D2STM: Dependable Distributed Software Transactional Memory

PRDC '09 Proceedings of the 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing
Speculative parallelization using software multi-threaded transactions

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Decoupled software pipelining creates parallelization opportunities

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

Kismet: parallel speedup estimates for serial programs

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Shared work list: hacking amorphous data parallelism in UPC

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
HELIX: automatic parallelization of irregular programs for chip multiprocessing

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Automatic speculative DOALL for clusters

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Dynamically dispatching speculative threads to improve sequential execution

ACM Transactions on Architecture and Code Optimization (TACO)
Optimizing software runtime systems for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Unifying thread-level speculation and transactional memory

Proceedings of the 13th International Middleware Conference
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In particular, high inter-node communication cost and lack of globally shared memory appear to make clusters suitable only for server applications with abundant task-level parallelism and scientific applications with regular and independent units of work. Clever use of pipeline parallelism (DSWP), thread-level speculation (TLS), and speculative pipeline parallelism (Spec-DSWP) can mitigate the costs of inter-thread communication on shared memory multicore machines. This paper presents Distributed Software Multi-threaded Transactional memory (DSMTX), a runtime system which makes these techniques applicable to non-shared memory clusters, allowing them to efficiently address inter-node communication costs. Initial results suggest that DSMTX enables efficient cluster execution of a wider set of application types. For 11 sequential C programs parallelized for a 4-core 32-node (128 total core) cluster without shared memory, DSMTX achieves a geomean speedup of 49x. This compares favorably to the 15x speedup achieved by our implementation of TLS-only support for clusters.