A repartitioning hypergraph model for dynamic load balancing

  • Authors:
  • Umit V. Catalyurek;Erik G. Boman;Karen D. Devine;Doruk Bozdağ;Robert T. Heaphy;Lee Ann Riesen

  • Affiliations:
  • The Ohio State University, Department of Biomedical Informatics, Columbus, OH 43210, United States and The Ohio State University, Department of Electrical and Computer Engineering, Columbus, OH 43 ...;Sandia National Laboratories, Department of Scalable Algorithms, Albuquerque, NM 87185, United States11Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin company ...;Sandia National Laboratories, Department of Scalable Algorithms, Albuquerque, NM 87185, United States11Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin company ...;The Ohio State University, Department of Electrical and Computer Engineering, Columbus, OH 43210, United States;Sandia National Laboratories, Department of Scalable Algorithms, Albuquerque, NM 87185, United States11Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin company ...;Sandia National Laboratories, Department of Scalable Algorithms, Albuquerque, NM 87185, United States11Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin company ...

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

In parallel adaptive applications, the computational structure of the applications changes over time, leading to load imbalances even though the initial load distributions were balanced. To restore balance and to keep communication volume low in further iterations of the applications, dynamic load balancing (repartitioning) of the changed computational structure is required. Repartitioning differs from static load balancing (partitioning) due to the additional requirement of minimizing migration cost to move data from an existing partition to a new partition. In this paper, we present a novel repartitioning hypergraph model for dynamic load balancing that accounts for both communication volume in the application and migration cost to move data, in order to minimize the overall cost. The use of a hypergraph-based model allows us to accurately model communication costs rather than approximate them with graph-based models. We show that the new model can be realized using hypergraph partitioning with fixed vertices and describe our parallel multilevel implementation within the Zoltan load balancing toolkit. To the best of our knowledge, this is the first implementation for dynamic load balancing based on hypergraph partitioning. To demonstrate the effectiveness of our approach, we conducted experiments on a Linux cluster with 1024 processors. The results show that, in terms of reducing total cost, our new model compares favorably to the graph-based dynamic load balancing approaches, and multilevel approaches improve the repartitioning quality significantly.