ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms

  • Authors:
  • Linlin Ding;Junchang Xin;Guoren Wang;Shan Huang

  • Affiliations:
  • Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China and College of Information Science & Engineering, Northeastern University, P.R. China;Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China and College of Information Science & Engineering, Northeastern University, P.R. China;Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China and College of Information Science & Engineering, Northeastern University, P.R. China;Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China and College of Information Science & Engineering, Northeastern University, P.R. China

  • Venue:
  • DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As a parallel programming model, MapReduce processes scalable and parallel applications with huge amounts of data on large clusters. In MapReduce framework, there are no communication mechanisms among Mappers, neither are among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data objects. We observe that this waste can be avoided by simple communication mechanisms. In this paper, we propose ComMapReduce, a framework that extends and improves MapReduce for efficient query processing of massive data in the cloud. With efficient lightweight communication mechanisms, ComMapReduce can effectively filter the unpromising intermediate data objects in Map phase so as to decrease the input of Reduce phase specifically. Three communication strategies, Lazy, Eager and Hybrid, are proposed to filter the unpromising intermediate results of Map phase. In addition, two optimization strategies, Prepositive and Postpositive, are presented to enhance the performance of query processing by filtering more candidate data objects. Our extensive experiments on different synthetic datasets demonstrate that ComMapReduce framework outperforms the original MapReduce framework in all metrics without affecting its existing characteristics.