On saying "enough already!" in MapReduce

  • Authors:
  • Christos Doulkeridis;Kjetil Nørvåg

  • Affiliations:
  • Norwegian University of Science and Technology, Sem Sælandsvei, Trondheim, Norway;Norwegian University of Science and Technology, Sem Sælandsvei, Trondheim, Norway

  • Venue:
  • Proceedings of the 1st International Workshop on Cloud Intelligence
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The MapReduce framework for parallel processing of massive data sets has attracted considerable attention recently, mainly due to its salient features that include scalability, simplicity, and fault-tolerance. However, despite its merits, MapReduce follows a brute-force approach, which often results in performing redundant work. This is particularly evident in the case of rank-aware queries, such as top-k, where a bounded set of k tuples comprise the result set. To process such queries in MapReduce, the input data needs to be accessed in its entirety, in order to produce the correct result set. To address this limitation of lack of early termination, in this paper, we investigate on different techniques that allow efficient processing of rank-aware queries, without accessing the input data exhaustively. We present various individual approaches that can be combined and demonstrate their advantages and shortcomings. Thus, we provide the first steps towards integrating efficient rank-aware processing in MapReduce.