Shared work list: hacking amorphous data parallelism in UPC

Authors:
Shixiong Xu;Li Chen
Affiliations:
Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China
Venue:
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Year:
2012

Citing 19
Cited 0

The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Multigrain parallel Delaunay Mesh generation: challenges and opportunities for multithreaded architectures

Proceedings of the 19th annual international conference on Supercomputing
Exploiting distributed version concurrency in a transactional memory cluster

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Software transactional memory for large scale clusters

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Intel® threading building blocks

Journal of Computing Sciences in Colleges
DiSTM: A Software Transactional Memory Framework for Clusters

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable Dynamic Load Balancing Using UPC

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Optimistic parallelism requires abstractions

Communications of the ACM - The Status of the P versus NP Problem
PFunc: modern task parallelism for modern high performance computing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
UTS: an unbalanced tree search benchmark

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Decoupled software pipelining creates parallelization opportunities

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Fast PGAS connected components algorithms

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Fast PGAS Implementation of Distributed Graph Algorithms

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable Speculative Parallelization on Commodity Clusters

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The tao of parallelism in algorithms

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Optimizing the Barnes-Hut algorithm in UPC

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

"Irregular" algorithms using data structures like sparse graphs, trees and sets prevail in the most emerging problems domains such as social network analysis, machine learning, data mining and computational science. The irregularity of underlying data structures leads to unstructured parallelism in these algorithms, consequently making it pretty hard for users to write efficient parallel implementations on distributed memory systems. Unified Parallel C language provides convenience of a global address space with the locality control needed for high performance and scalability. However, the Single Program Multiple Data execution model with a statically fixed set of executing threads makes UPC does not support applications with unstructured parallelism. In this paper, we first put forward Shared Work List to UPC and advocate a programming paradigm for writing applications with amorphous data parallelism on distributed memory systems. We also introduce user-assisted speculative execution based on Active Message model to support speculative execution on distributed memory systems. Efficient mechanism of work dispatching and related optimizations are presented as well. We preliminarily choose Breadth-first Search as a case study to demonstrate the feasibility, pro-grammability and performance benefits out of Shared Work List.