Algorithmic modifications to the Jacobi-Davidson parallel eigensolver to dynamically balance external CPU and memory load

  • Authors:
  • Richard Tran Mills;Andreas Stathopoulos;Evgenia Smirni

  • Affiliations:
  • Department of Computer Science, College of William and Mary, Williamsburg, Virginia;Department of Computer Science, College of William and Mary, Williamsburg, Virginia;Department of Computer Science, College of William and Mary, Williamsburg, Virginia

  • Venue:
  • ICS '01 Proceedings of the 15th international conference on Supercomputing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clusters of workstations (COWs) and SMPs have become popular and cost effective means of solving scientific problems. Because such environments may be heterogenous and/or time shared, dynamic load balancing is central to achieving high performance. Our thesis is that new levels of sophistication are required in parallel algorithm design and in the interaction of the algorithms with the runtime system. To support this thesis, we illustrate a novel approach for application-level balancing of external CPU and memory load on parallel iterative methods that employ some form of local preconditioning on each node. There are two key ideas. First, because all nodes need not perform their portion of the preconditioning phase to the same accuracy, the code can achieve perfect loadbalance, dynamically adapting to external CPU load, if we stop the preconditioning phase on all processors after a fixed amount of time. Second, if the program detects memory thrashing on a node, it recedes its preconditioning phase from that node, hopefully speeding the completion of competing jobs hence the relinquishing of their resources. We have implemented our load balancing approach in a state-of-the-art, coarse grain parallel Jacobi-Davidson eigensolver. Experimental results show that the new method adapts its algorithm based on runtime system information, without compromising the overall convergence behavior. We demonstrate the effectiveness of the new algorithm in a COW environment under (a) variable CPU load and (b) variable memory availability caused by competing applications.