PEGASUS: mining peta-scale graphs

Authors:
U Kang;Charalampos E. Tsourakakis;Christos Faloutsos
Affiliations:
Carnegie Mellon University, School of Computer Science, Department Computer Science, 15213, Pittsburgh, PA, USA;Carnegie Mellon University, School of Computer Science, Department Computer Science, 15213, Pittsburgh, PA, USA;Carnegie Mellon University, School of Computer Science, Department Computer Science, 15213, Pittsburgh, PA, USA
Venue:
Knowledge and Information Systems - Special Issue: Best Papers of the Fifth International Conference on Advanced Data Mining and Applications (ADMA 2009)
Year:
2011

Citing 0
Cited 13

HADI: Mining Radii of Large Graphs

ACM Transactions on Knowledge Discovery from Data (TKDD)
CloudVista: visual cluster exploration for extreme scale data in the cloud

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
MadLINQ: large-scale distributed matrix computation for the cloud

Proceedings of the 7th ACM european conference on Computer Systems
Bimodal invitation-navigation fair bets model for authority identification in a social network

Proceedings of the 21st international conference on World Wide Web
Managing and mining large graphs: patterns and algorithms

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
MapReduce algorithms for big data analysis

Proceedings of the VLDB Endowment
Fault tolerance logical network properties of irregular graphs

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Ligra: a lightweight graph processing framework for shared memory

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Graph-based semi-supervised learning with multi-modality propagation for large-scale image datasets

Journal of Visual Communication and Image Representation
Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Partial view selection for evolving social graphs

First International Workshop on Graph Data Management Experiences and Systems
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
TAO: Facebook's distributed data store for the social graph

USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe PeGaSus, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node, finding the connected components, and computing the importance score of nodes. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PeGaSus is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components, etc.) are essentially a repeated matrix-vector multiplication. In this paper, we describe a very important primitive for PeGaSus, called GIM-V (generalized iterated matrix-vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines, (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web graphs, thanks to Yahoo!, with ≈ 6.7 billion edges.