COSI: Cloud Oriented Subgraph Identification in Massive Social Networks

Authors:
Matthias Brocheler;Andrea Pugliese;V. S. Subrahmanian
Affiliations:
-;-;-
Venue:
ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
Year:
2010

Citing 0
Cited 8

Neighborhood based fast graph search in large networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
An edge-based framework for fast subgraph matching in a large graph

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Crunching large graphs with commodity processors

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Managing large dynamic graphs efficiently

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Towards effective partition management for large graphs

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Adaptive optimizations of recursive queries in teradata

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Inexact subgraph isomorphism in MapReduce

Journal of Parallel and Distributed Computing
Efficient Multiview Maintenance under Insertion in Huge Social Networks

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subgraph matching is a key operation on graph data. Social network (SN) providers may want to find all subgraphs within their social network that match certain query graph patterns. Unfortunately, subgraph matching is NP-complete, making its application to massive SNs a major challenge. Past work has shown how to implement subgraph matching on a single processor when the graph has 10-25M edges. In this paper, we show how to use cloud computing in conjunction with such existing single processor methods to efficiently match complex subgraphs on graphs as large as 778M edges. A cloud consists of one master compute node and k slave compute nodes. We first develop a probabilistic method to estimate probabilities that a vertex will be retrieved by a random query and that a pair of vertices will be successively retrieved by a random query. We use these probability estimates to define edge weights in an SN and to compute minimal edge cuts to partition the graph amongst k slave nodes. We develop algorithms for both master and slave nodes that try to minimize communication overhead. The resulting COSI system can answer complex queries over real-world SN data containing over 778M edges very efficiently.