i2MapReduce: incremental iterative MapReduce

  • Authors:
  • Yanfeng Zhang;Shimin Chen

  • Affiliations:
  • Northeastern University, China;Chinese Academy of Sciences

  • Venue:
  • Proceedings of the 2nd International Workshop on Cloud Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cloud intelligence applications often perform iterative computations (e.g., PageRank) on constantly changing data sets (e.g., Web graph). While previous studies extend MapReduce for efficient iterative computations, it is too expensive to perform an entirely new large-scale MapReduce iterative job to timely accommodate new changes to the underlying data sets. In this paper, we propose i2MapReduce to support incremental iterative computation. We observe that in many cases, the changes impact only a very small fraction of the data sets, and the newly iteratively converged state is quite close to the previously converged state. i2MapReduce exploits this observation to save re-computation by starting from the previously converged state, and by performing incremental updates on the changing data. Our preliminary result is quite promising. i2MapReduce sees significant performance improvement over re-computing iterative jobs in MapReduce.