Improving scalability and fault tolerance in an application management infrastructure

  • Authors:
  • Nikolay Topilski;Jeannie Albrecht;Amin Vahdat

  • Affiliations:
  • University of California, San Diego;Williams College;University of California, San Diego

  • Venue:
  • LASCO'08 First USENIX Workshop on Large-Scale Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper explores the challenges associated with distributed application management in large-scale computing environments. In particular, we investigate several techniques for extending Plush, an existing distributed application management framework, to provide improved scalability and fault tolerance without sacrificing performance. One of the main limitations of Plush is the structure of the underlying communication fabric. We explain how we incorporated the use of an overlay tree provided by Mace, a toolkit that simplifies the implementation of overlay networks, in place of the existing communication subsystem in Plush to improve robustness and scalability.