Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SCOPE: easy and efficient parallel processing of massive data sets
Proceedings of the VLDB Endowment
Hive: a warehousing solution over a map-reduce framework
Proceedings of the VLDB Endowment
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Spark: cluster computing with working sets
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete "map" and "reduce" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called "de-parallel". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a "client", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.