Enhancing application robustness in cloud data centers

  • Authors:
  • Madalin Mihailescu;Andres Rodriguez;Cristiana Amza;Dmitrijs Palcikovs;Gabriel Iszlai;Andrew Trossman;Joanna Ng

  • Affiliations:
  • University of Toronto;University of Toronto;University of Toronto;IBM Center of Advanced Studies;IBM Center of Advanced Studies;IBM Center of Advanced Studies;IBM Center of Advanced Studies

  • Venue:
  • Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose OX, a runtime system that uses application-level availability constraints and application topologies discovered on the fly to enhance resilience to infrastructure anomalies for cloud applications. OX allows application owners to specify groups of highly available virtual machines, following component roles and replication semantics. To discover application topologies, OX monitors network traffic among virtual machines, transparently. Based on this information, OX builds on-line topology graphs for applications and incrementally partitions these graphs across the infrastructure to enforce availability constraints and optimize communication between virtual machines. We evaluate OX in a realistic cloud setting using a mix of Hadoop and YCSB/Cassandra workloads. We show how OX increases application robustness, by protecting applications from network interference effects and rack-level failures.