Predicting execution bottlenecks in map-reduce clusters

  • Authors:
  • Edward Bortnikov;Ari Frank;Eshcar Hillel;Sriram Rao

  • Affiliations:
  • Yahoo! Labs, Haifa, Israel;Affectivon Inc, Kiryat Tivon, Israel;Yahoo! Labs, Haifa, Israel;Yahoo! Labs, Santa Clara, US

  • Venue:
  • HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extremely slow, or straggler, tasks are a major performance bottleneck in map-reduce systems. Hadoop infrastructure makes an effort to both avoid them (through minimizing remote data accesses) and handle them in the runtime (through speculative execution). However, the mechanisms in place neither guarantee the avoidance of performance hotspots in task scheduling, nor provide any easy way to tune the timely detection of stragglers. We suggest a machine-learning approach to address these problems, and introduce a slowdown predictor - an oracle to forecast how much slower a task will run on a given node, compared to similar tasks. Slowdown predictors can be embedded in the map-reduce infrastructure to improve the agility and timeliness of scheduling decisions. We provide initial evaluation to demonstrate the viability of our approach, and discuss the use cases for the new paradigm.