Beyond online aggregation: parallel and incremental data mining with online Map-Reduce

  • Authors:
  • Joos-Hendrik Böse;Artur Andrzejak;Mikael Högqvist

  • Affiliations:
  • Intl. Comp. Sci. Institute, Berkeley;Zuse Institute Berlin (ZIB), Berlin, Germany;Zuse Institute Berlin (ZIB), Berlin, Germany

  • Venue:
  • Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are only few data mining algorithms that work in a massively parallel and yet online (i.e. incremental) fashion. A combination of both features is essential for mining of large data streams and adds scalability to the concept of Online Aggregation introduced by J. M. Hellerstein et al. in 1997. We show how an online version of the Map-Reduce programming model can be used to implement such algorithms, and propose a solution for the "hardest" class of these algorithms - those requiring multiple Map-Reduce phases. An experimental evaluation confirms that the proposed methods can substantially accelerate interactive analysis of large data sets and facilitate scalable stream mining.