Dynamic distributed scheduling algorithm for state space search

  • Authors:
  • Ankur Narang;Abhinav Srivastava;Ramnik Jain;R. K. Shyamasundar

  • Affiliations:
  • IBM India Research Laboratory, New Delhi, India;IBM India Research Laboratory, New Delhi, India;IBM India Research Laboratory, New Delhi, India;Tata Institute of Fundamental Research, Mumbai, India

  • Venue:
  • Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Petascale computing requires complex runtime systems that need to consider load balancing along with low time and message complexity for scheduling massive scale parallel computations. Simultaneous consideration of these objectives makes online distributed scheduling a very challenging problem. For state space search applications such as UTS, NQueens, Balanced Tree Search, SAT and others, the computations are highly irregular and data dependent. Here, prior scheduling approaches such as [16], [14], [7], HotSLAW [10], which are dominantly locality-aware work-stealing driven, could lead to low parallel efficiency and scalability along with potentially high stack memory usage. In this paper we present a novel distributed scheduling algorithm (LDSS) for multi-place parallel computations, that uses an unique combination of d-choice randomized remote (inter-place) spawns and topology-aware randomized remote work steals to reduce the overheads in the scheduler and dynamically maintain load balance across the compute nodes of the system. Our design was implemented using GASNet API and POSIX threads. For the UTS (Unbalanced Tree Search) benchmark (using upto 4096 nodes of Blue Gene/P), we deliver the best parallel efficiency (92%) for 295B node binomial tree, better than [16] (87%) and demonstrate super-linear speedup on 1 Trillion node (largest studied so far) geometric tree along with higher tree node processing rate. We also deliver upto 40% better performance than Charm++. Further, our memory utilization is lower compared to HotSLAW. Moreover, for NQueens (N=18), we demonstrate superior parallel efficiency (92%) as compared Charm++ (85%).