Investigation of data locality and fairness in MapReduce

  • Authors:
  • Zhenhua Guo;Geoffrey Fox;Mo Zhou

  • Affiliations:
  • Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA;Indiana University, Bloomington, IN, USA

  • Venue:
  • Proceedings of third international workshop on MapReduce and its Applications Date
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In data-intensive computing, MapReduce is an important tool that allows users to process large amounts of data easily. Its data locality aware scheduling strategy exploits the locality of data accessing to minimize data movement and thus reduce network traffic. In this paper, we firstly analyze the state-of-the-art MapReduce scheduling algorithms and demonstrate that optimal scheduling is not guaranteed. After that, we mathematically reformulate the scheduling problem by using a cost matrix to capture the cost of data staging and propose an algorithm lsap-sched that yields optimal data locality. In addition, we integrate fairness and data locality into a unified algorithm lsap-fair-sched in which users can easily adjust the tradeoffs between data locality and fairness. At last, extensive simulation experiments are conducted to show that our algorithms can improve the ratio of data local tasks by up to 14%, reduce data movement cost by up to 90%, and balance fairness and data locality effectively.