Evaluating use of data flow systems for large graph analysis

  • Authors:
  • Andy Yoo;Ian Kaplan

  • Affiliations:
  • Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA

  • Venue:
  • Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large graph analysis has become increasingly important and is widely used in many applications such as web mining, social network analysis, biology, and information retrieval. The usually high computational complexity of the commonly-used graph algorithms and large volume of data frequently encountered in these applications, however, make scalable graph analysis a non-trivial task. Recently, more and more of these graph algorithms are implemented as dataflow applications, where many tasks perform assigned operations in parallel independent of other tasks. These applications are run on large-scale computing platforms to combine the advantages of the data parallelism enabled by dataflow model and the high computing power and large storage capacity offered by increasingly affordable high-end computers. In this paper, we evaluate the potentials of many-tasks concept in a form of dataflow system for large graph analysis applications by studying the performance of complicated graph algorithms on an actual dataflow machine. We have found that a dataflow system can achieve orders of magnitude performance improvement over state-of-art database systems and serve as a viable scalable graph analysis engine.