Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
PFunc: modern task parallelism for modern high performance computing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An adaptive task creation strategy for work-stealing scheduling
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Proceedings of the 24th ACM International Conference on Supercomputing
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A high-level framework for distributed processing of large-scale graphs
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Unbalanced tree search on a manycore system using the GPI programming model
Computer Science - Research and Development
HipG: parallel processing of large-scale graphs
ACM SIGOPS Operating Systems Review
A generic parallel collection framework
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
A work-stealing scheduler for X10's task parallelism with suspension
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Support for OpenMP tasks on cell architecture
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Compiler and runtime support for enabling reduction computations on heterogeneous systems
Concurrency and Computation: Practice & Experience
Haskell vs. f# vs. scala: a high-level language features and parallelism support comparison
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Work-stealing without the baggage
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Scheduling parallel programs by work stealing with private deques
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
DANBI: dynamic scheduling of irregular stream programs for many-core systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Energy-efficient work-stealing language runtimes
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
GLB: lifeline-based global load balancing library in x10
Proceedings of the first workshop on Parallel programming for analytics applications
Friendly barriers: efficient work-stealing with return barriers
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Hi-index | 0.00 |
Solving large, irregular graph problems efficiently is challenging. Current software systems and commodity multiprocessors do not support fine-grained, irregular parallelism well. We present XWS, the X10 Work Stealing framework, an open-source runtime for the parallel programming language X10 and a library to be used directly by application writers. XWS extends the Cilk work-stealing framework with several features necessary to efficiently implement graph algorithms, viz., support for improperly nested procedures, global termination detection, and phased computation. We also present a strategy to adaptively control the granularity of parallel tasks in the work-stealing scheme, depending on the instantaneous size of the work queue. We compare the performance of the XWS implementations of spanning tree algorithms with that of the hand-written C and Cilk implementations using various graph inputs. We show that XWS programs (written in Java) scale and exhibit comparable or better performance.