ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms

Authors:
Linlin Ding;Junchang Xin;Guoren Wang;Shan Huang
Affiliations:
Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China and College of Information Science & Engineering, Northeastern University, P.R. China;Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China and College of Information Science & Engineering, Northeastern University, P.R. China;Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China and College of Information Science & Engineering, Northeastern University, P.R. China;Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China and College of Information Science & Engineering, Northeastern University, P.R. China
Venue:
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Year:
2012

Citing 12
Cited 1

Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Experiences on Processing Spatial Data with MapReduce

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment

ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

As a parallel programming model, MapReduce processes scalable and parallel applications with huge amounts of data on large clusters. In MapReduce framework, there are no communication mechanisms among Mappers, neither are among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data objects. We observe that this waste can be avoided by simple communication mechanisms. In this paper, we propose ComMapReduce, a framework that extends and improves MapReduce for efficient query processing of massive data in the cloud. With efficient lightweight communication mechanisms, ComMapReduce can effectively filter the unpromising intermediate data objects in Map phase so as to decrease the input of Reduce phase specifically. Three communication strategies, Lazy, Eager and Hybrid, are proposed to filter the unpromising intermediate results of Map phase. In addition, two optimization strategies, Prepositive and Postpositive, are presented to enhance the performance of query processing by filtering more candidate data objects. Our extensive experiments on different synthetic datasets demonstrate that ComMapReduce framework outperforms the original MapReduce framework in all metrics without affecting its existing characteristics.