The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Proceedings of the 19th annual international conference on Supercomputing
Exploiting distributed version concurrency in a transactional memory cluster
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Software transactional memory for large scale clusters
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Intel® threading building blocks
Journal of Computing Sciences in Colleges
DiSTM: A Software Transactional Memory Framework for Clusters
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable Dynamic Load Balancing Using UPC
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Optimistic parallelism requires abstractions
Communications of the ACM - The Status of the P versus NP Problem
PFunc: modern task parallelism for modern high performance computing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
UTS: an unbalanced tree search benchmark
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Decoupled software pipelining creates parallelization opportunities
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Fast PGAS connected components algorithms
Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Fast PGAS Implementation of Distributed Graph Algorithms
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The tao of parallelism in algorithms
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Optimizing the Barnes-Hut algorithm in UPC
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
"Irregular" algorithms using data structures like sparse graphs, trees and sets prevail in the most emerging problems domains such as social network analysis, machine learning, data mining and computational science. The irregularity of underlying data structures leads to unstructured parallelism in these algorithms, consequently making it pretty hard for users to write efficient parallel implementations on distributed memory systems. Unified Parallel C language provides convenience of a global address space with the locality control needed for high performance and scalability. However, the Single Program Multiple Data execution model with a statically fixed set of executing threads makes UPC does not support applications with unstructured parallelism. In this paper, we first put forward Shared Work List to UPC and advocate a programming paradigm for writing applications with amorphous data parallelism on distributed memory systems. We also introduce user-assisted speculative execution based on Active Message model to support speculative execution on distributed memory systems. Efficient mechanism of work dispatching and related optimizations are presented as well. We preliminarily choose Breadth-first Search as a case study to demonstrate the feasibility, pro-grammability and performance benefits out of Shared Work List.