Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
An adaptable distributed query processing architecture
Data & Knowledge Engineering
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Efficient top-k processing in large-scaled distributed environments
Data & Knowledge Engineering
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads
Proceedings of the VLDB Endowment
Optimizing joins in a map-reduce environment
Proceedings of the 13th International Conference on Extending Database Technology
Privacy preserving group nearest neighbor queries
Proceedings of the 13th International Conference on Extending Database Technology
Continuously maintaining sliding window skylines in a sensor network
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Efficient parallel set-similarity joins using MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
A comparison of join algorithms for log processing in MaPreduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Online aggregation and continuous query support in MapReduce
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Analysis and evaluation of V*-kNN: an efficient algorithm for moving kNN queries
The VLDB Journal — The International Journal on Very Large Data Bases
Parallelizing multiple group-by query in share-nothing environment: a MapReduce study case
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
MRShare: sharing across multiple queries in MapReduce
Proceedings of the VLDB Endowment
Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)
Proceedings of the VLDB Endowment
Map-reduce extensions and recursive queries
Proceedings of the 14th International Conference on Extending Database Technology
Automatic optimization for MapReduce programs
Proceedings of the VLDB Endowment
Processing theta-joins using MapReduce
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Adapting skyline computation to the MapReduce framework: algorithms and experiments
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters
IEEE Transactions on Knowledge and Data Engineering
Continuous data stream query in the cloud
Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient parallel kNN joins for large data in MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Parallel Top-K Similarity Join Algorithms Using MapReduce
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
As a parallel programming framework, MapReduce can process scalable and parallel applications with large scale datasets. The executions of Mappers and Reducers are independent of each other. There is no communication among Mappers, neither among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data. We observe that this waste can be significantly reduced by simple communication mechanisms to enhance the performance of MapReduce. In this paper, we propose ComMapReduce, an efficient framework that extends and improves MapReduce for big data applications in the cloud. ComMapReduce can effectively obtain certain shared information with efficient lightweight communication mechanisms. Three basic communication strategies, Lazy, Eager and Hybrid, and two optimization communication strategies, Prepositive and Postpositive, are proposed to obtain the shared information and effectively process big data applications. We also illustrate the implementations of three typical applications with large scale datasets on ComMapReduce. Our extensive experiments demonstrate that ComMapReduce outperforms MapReduce in all metrics without affecting the existing characteristics of MapReduce.