SUMMA: subgraph matching in massive graphs

Authors:
Shijie Zhang;Shirong Li;Jiong Yang
Affiliations:
Case Western Reserve University, Cleveland, OH, USA;Case Western Reserve University, Cleveland, OH, USA;Case Western Reserve University, Cleveland, OH, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 3
Cited 3

Trading off space for passes in graph streaming problems

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Scalable network distance browsing in spatial databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
GADDI: distance index based subgraph matching in biological networks

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

DELTA: indexing and querying multi-labeled graphs

Proceedings of the 20th ACM international conference on Information and knowledge management
STUN: Spatio-Temporal Uncertain (Social) Networks

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Efficient Multiview Maintenance under Insertion in Huge Social Networks

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs can represent a large number of data types, e.g., online social networks, internet links, procedure dependency graphs, etc. The need for indexing massive graphs is an urgent research problem of great practical importance. The main challenge is the size. Each graph may contain at least tens of millions vertices. The working memory may not be able to store the database graph due to its large size, which increases the processing time significantly. We propose a novel index based subgraph matching scheme, namely SUMMA, for graph querying in massive graphs. We devise two novel indices which capture both local and global information of the database graph. SUMMA is further optimized by the use of a matching scheme to reduce redundant calculations and disk accesses. Last but not least, a number of synthetic datasets are used to evaluate the efficiency and scalability of our proposed method.