Global trees: a framework for linked data structures on distributed memory parallel systems
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Selective Recovery from Failures in a Task Parallel Programming Model
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Dynamic distributed scheduling algorithm for state space search
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Design and implementation of a customizable work stealing scheduler
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
A framework for load balancing of tensor contraction expressions via dynamic task partitioning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A system framework and API for run-time adaptable parallel software
Proceedings of the 2013 Research in Adaptive and Convergent Systems
JETS: Language and System Support for Many-Parallel-Task Workflows
Journal of Grid Computing
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications
Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology
Hi-index | 0.00 |
We introduce Scioto, Shared Collections of Task Objects, a lightweight framework for providing task management on distributed memory machines under one-sided and global-view parallel programming models. Scioto provides locality aware dynamic load balancing and interoperates with MPI, ARMCI, and Global Arrays. Additionally, Scioto's task model and programming interface are compatible with many other existing parallel models including UPC, SHMEM, and CAF. Through task parallelism, the Scioto framework provides a solution for overcoming irregularity, load imbalance, and heterogeneity as well as dynamic mapping of computation onto emerging architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the Unbalanced Tree Search (UTS) benchmark and two quantum chemistry codes: the closed shell Self-Consistent Field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.