Dynamic distributed scheduling algorithm for state space search

Authors:
Ankur Narang;Abhinav Srivastava;Ramnik Jain;R. K. Shyamasundar
Affiliations:
IBM India Research Laboratory, New Delhi, India;IBM India Research Laboratory, New Delhi, India;IBM India Research Laboratory, New Delhi, India;Tata Institute of Fundamental Research, Mumbai, India
Venue:
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Year:
2012

Citing 16
Cited 0

Heuristics: intelligent search strategies for computer problem solving

Heuristics: intelligent search strategies for computer problem solving
A dynamic scheduling strategy for the Chare-Kernel system

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
An almost perfect heuristic for the N nonattacking queens problem

Information Processing Letters
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
The power of two choices in randomized load balancing

The power of two choices in randomized load balancing
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Multiprocessing of Combinatorial Search Problems

Computer
Scioto: A Framework for Global-View Task Parallelism

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable Dynamic Load Balancing Using UPC

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers

ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Work stealing for multi-core HPC clusters

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
An Adaptive Framework for Large-Scale State Space Search

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

Petascale computing requires complex runtime systems that need to consider load balancing along with low time and message complexity for scheduling massive scale parallel computations. Simultaneous consideration of these objectives makes online distributed scheduling a very challenging problem. For state space search applications such as UTS, NQueens, Balanced Tree Search, SAT and others, the computations are highly irregular and data dependent. Here, prior scheduling approaches such as [16], [14], [7], HotSLAW [10], which are dominantly locality-aware work-stealing driven, could lead to low parallel efficiency and scalability along with potentially high stack memory usage. In this paper we present a novel distributed scheduling algorithm (LDSS) for multi-place parallel computations, that uses an unique combination of d-choice randomized remote (inter-place) spawns and topology-aware randomized remote work steals to reduce the overheads in the scheduler and dynamically maintain load balance across the compute nodes of the system. Our design was implemented using GASNet API and POSIX threads. For the UTS (Unbalanced Tree Search) benchmark (using upto 4096 nodes of Blue Gene/P), we deliver the best parallel efficiency (92%) for 295B node binomial tree, better than [16] (87%) and demonstrate super-linear speedup on 1 Trillion node (largest studied so far) geometric tree along with higher tree node processing rate. We also deliver upto 40% better performance than Charm++. Further, our memory utilization is lower compared to HotSLAW. Moreover, for NQueens (N=18), we demonstrate superior parallel efficiency (92%) as compared Charm++ (85%).