Accelerate large-scale iterative computation through asynchronous accumulative updates

  • Authors:
  • Yanfeng Zhang;Qixin Gao;Lixin Gao;Cuirong Wang

  • Affiliations:
  • Northeastern University & University of Massachuesetts Amherst, Shenyang, China;Northeastern University, Qinhuangdao, China;University of Massachusetts Amherst, Amherst, USA;Northeastern University, Qinhuangdao, China

  • Venue:
  • Proceedings of the 3rd workshop on Scientific Cloud Computing Date
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Myriad of data mining algorithms in scientific computing require parsing data sets iteratively. These iterative algorithms have to be implemented in a distributed environment to scale to massive data sets. To accelerate iterative computations in a large-scale distributed environment, we identify a broad class of iterative computations that can accumulate iterative update results. Specifically, different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, accumulative iterative update accumulates the intermediate iterative update results. We prove that an accumulative update will yield the same result as its corresponding traditional iterative update. Furthermore, accumulative iterative computation can be performed asynchronously and converges much faster. We present a general computation model to describe asynchronous accumulative iterative computation. Based on the computation model, we design and implement a distributed framework, Maiter. We evaluate Maiter on Amazon EC2 Cloud with 100 EC2 instances. Our results show that Maiter achieves as much as 60x speedup over Hadoop for implementing iterative algorithms.