Limplock: understanding the impact of limpware on scale-out cloud systems

  • Authors:
  • Thanh Do;Mingzhe Hao;Tanakorn Leesatapornwongsa;Tiratat Patana-anake;Haryadi S. Gunawi

  • Affiliations:
  • University of Wisconsin-Madison;University of Chicago;University of Chicago;University of Chicago;University of Chicago

  • Venue:
  • Proceedings of the 4th annual Symposium on Cloud Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We highlight one often-overlooked cause of performance failure: limpware -- "limping" hardware whose performance degrades significantly compared to its specification. We report anecdotes of degraded disks and network components seen in large-scale production. To measure the system-level impact of limpware, we assembled limpbench, a set of benchmarks that combine data-intensive load and limpware injections. We benchmark five cloud systems (Hadoop, HDFS, ZooKeeper, Cassandra, and HBase) and find that limpware can severely impact distributed operations, nodes, and an entire cluster. From this, we introduce the concept of limplock, a situation where a system progresses slowly due to the presence of limpware and is not capable of failing over to healthy components. We show how each cloud system that we analyze can exhibit operation, node, and cluster limplock. We conclude that many cloud systems are not limpware tolerant.