Heuristics: intelligent search strategies for computer problem solving
Heuristics: intelligent search strategies for computer problem solving
A dynamic scheduling strategy for the Chare-Kernel system
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
An almost perfect heuristic for the N nonattacking queens problem
Information Processing Letters
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
The power of two choices in randomized load balancing
The power of two choices in randomized load balancing
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Scioto: A Framework for Global-View Task Parallelism
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable Dynamic Load Balancing Using UPC
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Work stealing for multi-core HPC clusters
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
An Adaptive Framework for Large-Scale State Space Search
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Hi-index | 0.00 |
Petascale computing requires complex runtime systems that need to consider load balancing along with low time and message complexity for scheduling massive scale parallel computations. Simultaneous consideration of these objectives makes online distributed scheduling a very challenging problem. For state space search applications such as UTS, NQueens, Balanced Tree Search, SAT and others, the computations are highly irregular and data dependent. Here, prior scheduling approaches such as [16], [14], [7], HotSLAW [10], which are dominantly locality-aware work-stealing driven, could lead to low parallel efficiency and scalability along with potentially high stack memory usage. In this paper we present a novel distributed scheduling algorithm (LDSS) for multi-place parallel computations, that uses an unique combination of d-choice randomized remote (inter-place) spawns and topology-aware randomized remote work steals to reduce the overheads in the scheduler and dynamically maintain load balance across the compute nodes of the system. Our design was implemented using GASNet API and POSIX threads. For the UTS (Unbalanced Tree Search) benchmark (using upto 4096 nodes of Blue Gene/P), we deliver the best parallel efficiency (92%) for 295B node binomial tree, better than [16] (87%) and demonstrate super-linear speedup on 1 Trillion node (largest studied so far) geometric tree along with higher tree node processing rate. We also deliver upto 40% better performance than Charm++. Further, our memory utilization is lower compared to HotSLAW. Moreover, for NQueens (N=18), we demonstrate superior parallel efficiency (92%) as compared Charm++ (85%).