CORBA Based Runtime Support for Load Distribution and Fault Tolerance

  • Authors:
  • Thomas Barth;Gerd Flender;Bernd Freisleben;Manfred Grauer;Frank Thilo

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Parallel scientific computing in a distributed computing environment based on CORBA requires additional services not (yet) included in the CORBA specification: load distribution and fault tolerance. Both of them are essential for long running applications with high computational demands as in the case of computational engineering applications. The proposed approach for providing these services is based on integrating load distribution into the CORBA naming service which in turn relies on information provided by the underlying WINNER resource management system developed for typical networked Unix workstation environments. The support of fault tolerance is based on error detection and backward reco very by introducing proxy objects which manage checkpointing and restart of services in case of failures. A protoytpical implementation of the complete system is presented, and performance results obtained for the parallel optimization of a mathematical benchmark function are discussed.