A resiliency model for high performance infrastructure based on logical encapsulation

  • Authors:
  • James J. Moore;Carl Kesselman

  • Affiliations:
  • EMC, Los Angeles, CA, USA and University of Southern California, Marina Del Rey, CA, USA;University of Southern California, Marina Del Rey, CA, USA

  • Venue:
  • Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

An emerging trend in distributed systems is the creation of dynamically provisioned heterogeneous high performance platforms that include the co-allocation of both virtualized computing and network attached storage volumes offering NAS and SAN level data services. These high performance computing environments support parallel applications performing traditional file system operations. As with any parallel platform the ability to continue computation in the face of component failures is an important characteristic. Achieving resiliency in heterogeneous environments presents unique challenges and opportunities not found in homogeneous aggregations of computing resources. We present a logical encapsulation model for heterogeneous high performance infrastructure, which enables a reactive resiliency approach for federations of virtual machines and externally hosted physical storage volumes. Asynchronous state capture and restoration models are presented for individual resources, which are composed into non-blocking resiliency models for logical encapsulations. We perform an evaluation that demonstrates our methodology has greater overall flexibility and significant performance improvements when compared to current resiliency approaches in virtualized distributed execution environments.