MR-runner: a modularized map-reduce job management tool

  • Authors:
  • Xinsheng Yang;Wei Wang;Lijie Xu;Jie liu;Jun Wei

  • Affiliations:
  • Chinese Academy of Sciences, Beijing, P.R. China;Chinese Academy of Sciences, Beijing, P.R. China;Chinese Academy of Sciences, Beijing, P.R. China;Chinese Academy of Sciences, Beijing, P.R. China;Chinese Academy of Sciences, Beijing, P.R. China

  • Venue:
  • Proceedings of the 5th Asia-Pacific Symposium on Internetware
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Map-Reduce is a powerful solution for processing and analyzing large-scale data. Just as Hadoop and Spark are able to deal with terabyte data and even more. Users only need to complete "map" and "reduce" function, the Map-Reduce framework can finish variety jobs. But many machine learning and data mining algorithms cannot leverage the Map-Reduce framework or it would take large efforts to modify the algorithm itself. This issue can be explained by the following ways: 1. Map-Reduce is a batch operation so that most of Map-Reduce frameworks do not built-in to support iteration. 2. Map-Reduce is absolutely parallel, each vertex cannot obtain all records, so none of them could get the global optimal model. In this paper, we proposed a job management tool to enable the Map-Reduce framework to support iteration, called "de-parallel". This make the Map-Reduce framework like Hadoop so that Map-Reduce could run more algorithms and support more various tasks. In addition, our tool does not modify the Map-Reduce framework itself. In face MR-Runner interacts with Map-Reduce framework like a "client", therefore MR-Runner could be deployed in any single PC instead of Map-Reduce cluster. We also abstract the mainly interface related to Map-Reduce frameworks, this makes our tool portable to the representative Map-Reduce frameworks.