Evaluating use of data flow systems for large graph analysis

Authors:
Andy Yoo;Ian Kaplan
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA
Venue:
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Year:
2009

Citing 7
Cited 2

An efficient pipelined dataflow processor architecture

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
A preliminary architecture for a basic data-flow processor

25 years of the international symposia on Computer architecture (selected papers)
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
Advances in the dataflow computational model

Parallel Computing - Special Anniversary issue
Active Disks for Large-Scale Data Processing

Computer
AutoPart: parameter-free graph partitioning and outlier detection

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing

A middleware for parallel processing of large graphs

Proceedings of the 8th International Workshop on Middleware for Grids, Clouds and e-Science
A first view of exedra: a domain-specific language for large graph analytics workflows

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large graph analysis has become increasingly important and is widely used in many applications such as web mining, social network analysis, biology, and information retrieval. The usually high computational complexity of the commonly-used graph algorithms and large volume of data frequently encountered in these applications, however, make scalable graph analysis a non-trivial task. Recently, more and more of these graph algorithms are implemented as dataflow applications, where many tasks perform assigned operations in parallel independent of other tasks. These applications are run on large-scale computing platforms to combine the advantages of the data parallelism enabled by dataflow model and the high computing power and large storage capacity offered by increasingly affordable high-end computers. In this paper, we evaluate the potentials of many-tasks concept in a form of dataflow system for large graph analysis applications by studying the performance of complicated graph algorithms on an actual dataflow machine. We have found that a dataflow system can achieve orders of magnitude performance improvement over state-of-art database systems and serve as a viable scalable graph analysis engine.