PonD: dynamic creation of HTC pool on demand using a decentralized resource discovery system

  • Authors:
  • Kyungyong Lee;David Wolinsky;Renato J. Figueiredo

  • Affiliations:
  • University of Florida, Gainesville, FL, USA;Yale University, New Haven, CT, USA;University of Florida, Gainesville, FL, USA

  • Venue:
  • Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

High Throughput Computing (HTC) platforms aggregate heterogeneous resources to provide vast amounts of computing power over a long period of time. Typical HTC systems, such as Condor and BOINC, rely on central managers for resource discovery and scheduling. While this approach simplifies deployment, it requires careful system configuration and management to ensure high availability and scalability. In this paper, we present a novel approach that integrates a self-organizing P2P overlay for scalable and timely discovery of resources with unmodified client/server job scheduling middleware in order to create HTC virtual resource Pools on Demand (PonD). This approach decouples resource discovery and scheduling from job execution/monitoring - a job submission dynamically generates an HTC platform based upon resources discovered through match-making from a large "sea" of resources in the P2P overlay and forms a "PonD" capable of leveraging unmodified HTC middleware for job execution and monitoring. We show that job scheduling time of our approach scales with O(log N), where N is the number of resources in a pool, through first-order analytical models and large-scale simulation results. To verify the practicality of PonD, we have implemented a prototype using Condor (called C-PonD), a structured P2P overlay, and a PonD creation module. Experimental results with the prototype in two WAN environments (PlanetLab and the FutureGrid cloud computing testbed) demonstrates the utility of C-PonD as a HTC approach without relying on a central repository for maintaining all resource information. Though the prototype is based on Condor, the decoupled nature of the system components - decentralized resource discovery, PonD creation, job execution/monitoring - is generally applicable to other grid computing middleware systems.