ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms

Authors:
Linlin Ding;Guoren Wang;Junchang Xin;Xiaoyang Wang;Shan Huang;Rui Zhang
Affiliations:
College of Information Science and Engineering, Northeastern University, China;College of Information Science and Engineering, Northeastern University, China;College of Information Science and Engineering, Northeastern University, China;College of Information Science and Engineering, Northeastern University, China;College of Information Science and Engineering, Northeastern University, China;Department of Computing and Information Systems, The University of Melbourne, Australia
Venue:
Data & Knowledge Engineering
Year:
2013

Citing 32
Cited 0

Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
An adaptable distributed query processing architecture

Data & Knowledge Engineering
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Efficient top-k processing in large-scaled distributed environments

Data & Knowledge Engineering
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
The Chubby lock service for loosely-coupled distributed systems

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment

Proceedings of the 13th International Conference on Extending Database Technology
Privacy preserving group nearest neighbor queries

Proceedings of the 13th International Conference on Extending Database Technology
Continuously maintaining sliding window skylines in a sensor network

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Efficient parallel set-similarity joins using MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Online aggregation and continuous query support in MapReduce

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Analysis and evaluation of V*-kNN: an efficient algorithm for moving kNN queries

The VLDB Journal — The International Journal on Very Large Data Bases
Parallelizing multiple group-by query in share-nothing environment: a MapReduce study case

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce

Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Proceedings of the VLDB Endowment
Map-reduce extensions and recursive queries

Proceedings of the 14th International Conference on Extending Database Technology
Automatic optimization for MapReduce programs

Proceedings of the VLDB Endowment
Processing theta-joins using MapReduce

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Adapting skyline computation to the MapReduce framework: algorithms and experiments

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters

IEEE Transactions on Knowledge and Data Engineering
Continuous data stream query in the cloud

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient parallel kNN joins for large data in MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Parallel Top-K Similarity Join Algorithms Using MapReduce

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

As a parallel programming framework, MapReduce can process scalable and parallel applications with large scale datasets. The executions of Mappers and Reducers are independent of each other. There is no communication among Mappers, neither among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data. We observe that this waste can be significantly reduced by simple communication mechanisms to enhance the performance of MapReduce. In this paper, we propose ComMapReduce, an efficient framework that extends and improves MapReduce for big data applications in the cloud. ComMapReduce can effectively obtain certain shared information with efficient lightweight communication mechanisms. Three basic communication strategies, Lazy, Eager and Hybrid, and two optimization communication strategies, Prepositive and Postpositive, are proposed to obtain the shared information and effectively process big data applications. We also illustrate the implementations of three typical applications with large scale datasets on ComMapReduce. Our extensive experiments demonstrate that ComMapReduce outperforms MapReduce in all metrics without affecting the existing characteristics of MapReduce.