A self-organizing flock of Condors

  • Authors:
  • Ali R. Butt;Rongmei Zhang;Y. Charlie Hu

  • Affiliations:
  • School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA;School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Condor enables high throughput computing using off-the-shelf cost-effective components. It also supports flocking, a mechanism for sharing resources among Condor pools. Since Condor pools distributed over a wide area can have dynamically changing availability and sharing preferences, the current flocking mechanism based on static configurations can limit the potential of sharing resources across Condor pools. This paper presents a technique for resource discovery in distributed Condor pools using peer-to-peer mechanisms that are self-organizing, fault-tolerant, scalable, and locality-aware. Locality-awareness guarantees that applications are not shipped across long distances when nearby resources are available. Measurements using a synthetic job trace show that self-organized flocking reduces the maximum job wait time in queue for a heavily loaded pool by a factor of 10 compared to without flocking. Simulations of 1000 Condor pools are also presented and the results confirm that our technique discovers and utilizes nearby resources in the physical network.